Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 7:(6):e28073.
doi: 10.3897/BDJ.6.e28073. eCollection 2018.

A story of data won, data lost and data re-found: the realities of ecological data preservation

Affiliations

A story of data won, data lost and data re-found: the realities of ecological data preservation

Alison Specht et al. Biodivers Data J. .

Abstract

This paper discusses the process of retrieval and updating legacy data to allow on-line discovery and delivery. There are many pitfalls of institutional and non-institutional ecological data conservation over the long term. Interruptions to custodianship, old media, lost knowledge and the continuous evolution of species names makes resurrection of old data challenging. We caution against technological arrogance and emphasise the importance of international standards. We use a case study of a compiled set of continent-wide vegetation survey data for which, although the analyses had been published, the raw data had not. In the original study, publications containing plot data collected from the 1880s onwards had been collected, interpreted, digitised and integrated for the classification of vegetation and analysis of its conservation status across Australia. These compiled data are an extremely valuable national collection that demanded publishing in open, readily accessible online repositories, such as the Terrestrial Ecosystem Research Network (http://www.tern.org.au) and the Atlas of Living Australia (ALA: http://www.ala.org.au), the Australian node of the Global Biodiversity Information Facility (GBIF: http://www.gbif.org). It is hoped that the lessons learnt from this project may trigger a sober review of the value of endangered data, the cost of retrieval and the importance of suitable and timely archiving through the vicissitudes of technological change, so the initial unique collection investment enables multiple re-use in perpetuity.

Keywords: data conservation; data curation; data retrieval; legacy data; long-term data accessibility.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The workflow from collation of original documents (A) through the publication of the ‘Conservation Atlas’ (E) to the retrieval project (G). The first step was to extract and digitise data from written publications (A-B). Due to the computing limitations of the time, it was necessary to split the data into sub-files (B and C) for analysis (D) which was the aim of the original project ('The Conservation Atlas' 1975-1995). Storage throughout the Conservation Atlas project was in both hard copy printouts and digital form. The ‘mainframe’ computers referred to were those from the PDP-10 computer family through the University of Queensland computer centre. The magnetic tapes were used as backup storage from the PDP-10s and the Exabyte tape was used to store the data from the magnetic tapes at the end of the Conservation Atlas project. Note: Letters are used to facilitate reference to the figure from the text. The temporal axis is not to scale.
Figure 2.
Figure 2.
Illustration of the data resources available to the retrieval project: (i) a sample of the boxes of original copies of papers and reports (A), (ii) a table extracted from a publication prepared for data entry (B), (iii) a sample of the hard copy printouts showing alphanumeric lists of species under each location and community (C), (iv) the magnetic tapes on which backups were kept from day to day during the 1980s project (D), and (v) an exabyte tape on to which the data from the magnetic tapes were transferred in 1991 (E).
Figure 3.
Figure 3.
Diagrammatic representation of the workflow for retrieval of data from the original reference files (A). These files were separated into two parts for editing influenced by the 1980s organisation of the data: (i) information on the sites at which data were collected (B), and (ii) the species lists, which were updated through the Biodiversity Information Explorer, BIE (http://bie.ala.org.au/ws) (C). Once these components were updated, they were re-assembled using DarwinCore standards (D) to enable delivery through a data portal (in this case the Knowledge Network for Biocomplexity, KNB (https://knb.ecoinformatics.org). Ecological Metadata Language (EML) was used to describe the dataset.

Similar articles

Cited by

References

    1. Aronova E., Baker K. S., Oreskes N. Big Science and Big Data in biology: from the International Geophysical Year through the International Biological Program to the Long Term Ecological Research (LTER) Network, 1957-Presentamore. Historical Studies of Natural Science. 2014;40(2):183–224. doi: 10.1525/hsns.2010.40.2.183. - DOI
    1. Bagley P. R. Extension of programming language concepts. University City Science Center; Philadelphia, USA: 1968.
    1. Barker W. R. Standardising informal names in Australian publications. http://www.anbg.gov.au/asbs/newsletter/pdf/05-march-122.pdf Australian Systematic Botany Society Newsletter. 2005;122:11–12.
    1. Barlow B. A. Flora and Fauna of Alpine Australasia. CSIRO in association with the Australian Systematic Botany Society; Melbourne, Australia: 1986. 543.
    1. Belbin L. CSIRO, Division of Wildlife Ecology, Australia; 1994. PATN: Pattern analysis package: technical reference .

LinkOut - more resources