A story of data won, data lost and data re-found: the realities of ecological data preservation

Alison Specht¹, Matthew P Bolton², Bryn Kingsford³, Raymond L Specht⁴, Lee Belbin⁵

Affiliations

¹ University of Queensland, Brisbane, Australia University of Queensland Brisbane Australia.
² Corymbia Ecospatial Consultants, Canberra, Australia Corymbia Ecospatial Consultants Canberra Australia.
³ Structured Data, Canberra, Australia Structured Data Canberra Australia.
⁴ Emeritus Professor, Brisbane, Australia Emeritus Professor Brisbane Australia.
⁵ Atlas of Living Australia, CSIRO, Canberra, Australia Atlas of Living Australia, CSIRO Canberra Australia.

PMID: 30473618
PMCID: PMC6235994
DOI: 10.3897/BDJ.6.e28073

A story of data won, data lost and data re-found: the realities of ecological data preservation

Alison Specht et al. Biodivers Data J. 2018.

. 2018 Nov 7:(6):e28073.

doi: 10.3897/BDJ.6.e28073. eCollection 2018.

Authors

Alison Specht¹, Matthew P Bolton², Bryn Kingsford³, Raymond L Specht⁴, Lee Belbin⁵

Affiliations

¹ University of Queensland, Brisbane, Australia University of Queensland Brisbane Australia.
² Corymbia Ecospatial Consultants, Canberra, Australia Corymbia Ecospatial Consultants Canberra Australia.
³ Structured Data, Canberra, Australia Structured Data Canberra Australia.
⁴ Emeritus Professor, Brisbane, Australia Emeritus Professor Brisbane Australia.
⁵ Atlas of Living Australia, CSIRO, Canberra, Australia Atlas of Living Australia, CSIRO Canberra Australia.

PMID: 30473618
PMCID: PMC6235994
DOI: 10.3897/BDJ.6.e28073

Abstract

This paper discusses the process of retrieval and updating legacy data to allow on-line discovery and delivery. There are many pitfalls of institutional and non-institutional ecological data conservation over the long term. Interruptions to custodianship, old media, lost knowledge and the continuous evolution of species names makes resurrection of old data challenging. We caution against technological arrogance and emphasise the importance of international standards. We use a case study of a compiled set of continent-wide vegetation survey data for which, although the analyses had been published, the raw data had not. In the original study, publications containing plot data collected from the 1880s onwards had been collected, interpreted, digitised and integrated for the classification of vegetation and analysis of its conservation status across Australia. These compiled data are an extremely valuable national collection that demanded publishing in open, readily accessible online repositories, such as the Terrestrial Ecosystem Research Network (http://www.tern.org.au) and the Atlas of Living Australia (ALA: http://www.ala.org.au), the Australian node of the Global Biodiversity Information Facility (GBIF: http://www.gbif.org). It is hoped that the lessons learnt from this project may trigger a sober review of the value of endangered data, the cost of retrieval and the importance of suitable and timely archiving through the vicissitudes of technological change, so the initial unique collection investment enables multiple re-use in perpetuity.

Keywords: data conservation; data curation; data retrieval; legacy data; long-term data accessibility.

PubMed Disclaimer

Figures

**Figure 1.**
The workflow from collation of original documents (A) through the publication of the ‘Conservation Atlas’ (E) to the retrieval project (G). The first step was to extract and digitise data from written publications (A-B). Due to the computing limitations of the time, it was necessary to split the data into sub-files (B and C) for analysis (D) which was the aim of the original project ('The Conservation Atlas' 1975-1995). Storage throughout the Conservation Atlas project was in both hard copy printouts and digital form. The ‘mainframe’ computers referred to were those from the PDP-10 computer family through the University of Queensland computer centre. The magnetic tapes were used as backup storage from the PDP-10s and the Exabyte tape was used to store the data from the magnetic tapes at the end of the Conservation Atlas project. Note: Letters are used to facilitate reference to the figure from the text. The temporal axis is not to scale.

**Figure 2.**
Illustration of the data resources available to the retrieval project: (i) a sample of the boxes of original copies of papers and reports (A), (ii) a table extracted from a publication prepared for data entry (B), (iii) a sample of the hard copy printouts showing alphanumeric lists of species under each location and community (C), (iv) the magnetic tapes on which backups were kept from day to day during the 1980s project (D), and (v) an exabyte tape on to which the data from the magnetic tapes were transferred in 1991 (E).

**Figure 3.**
Diagrammatic representation of the workflow for retrieval of data from the original reference files (A). These files were separated into two parts for editing influenced by the 1980s organisation of the data: (i) information on the sites at which data were collected (B), and (ii) the species lists, which were updated through the Biodiversity Information Explorer, BIE (http://bie.ala.org.au/ws) (C). Once these components were updated, they were re-assembled using DarwinCore standards (D) to enable delivery through a data portal (in this case the Knowledge Network for Biocomplexity, KNB (https://knb.ecoinformatics.org). Ecological Metadata Language (EML) was used to describe the dataset.

See this image and copyright information in PMC

Cited by

Checklist of the suborder Terebrantia (Thysanoptera): generic diversity and species composition in Xishuangbanna, Yunnan Province, China.
Elie N, Yajin L, Yanlan X, Yanli Z, Hongrui Z. Elie N, et al. Biodivers Data J. 2021 Nov 24;9:e72670. doi: 10.3897/BDJ.9.e72670. eCollection 2021. Biodivers Data J. 2021. PMID: 34866961 Free PMC article.
Open Data Practices among Users of Primary Biodiversity Data.
Mandeville CP, Koch W, Nilsen EB, Finstad AG. Mandeville CP, et al. Bioscience. 2021 Aug 18;71(11):1128-1147. doi: 10.1093/biosci/biab072. eCollection 2021 Nov. Bioscience. 2021. PMID: 34733117 Free PMC article.
Outbound Data Legality Analysis in CPTPP Countries under the Environment of Cross-Border Data Flow Governance.
Li J. Li J. J Environ Public Health. 2022 Sep 28;2022:6105804. doi: 10.1155/2022/6105804. eCollection 2022. J Environ Public Health. 2022. Retraction in: J Environ Public Health. 2023 Jun 28;2023:9769087. doi: 10.1155/2023/9769087. PMID: 36213036 Free PMC article. Retracted.

References

1. Aronova E., Baker K. S., Oreskes N. Big Science and Big Data in biology: from the International Geophysical Year through the International Biological Program to the Long Term Ecological Research (LTER) Network, 1957-Presentamore. Historical Studies of Natural Science. 2014;40(2):183–224. doi: 10.1525/hsns.2010.40.2.183. - DOI
1. Bagley P. R. Extension of programming language concepts. University City Science Center; Philadelphia, USA: 1968.
1. Barker W. R. Standardising informal names in Australian publications. http://www.anbg.gov.au/asbs/newsletter/pdf/05-march-122.pdf Australian Systematic Botany Society Newsletter. 2005;122:11–12.
1. Barlow B. A. Flora and Fauna of Alpine Australasia. CSIRO in association with the Australian Systematic Botany Society; Melbourne, Australia: 1986. 543.
1. Belbin L. CSIRO, Division of Wildlife Ecology, Australia; 1994. PATN: Pattern analysis package: technical reference .

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A story of data won, data lost and data re-found: the realities of ecological data preservation

Affiliations

A story of data won, data lost and data re-found: the realities of ecological data preservation

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources