Digitising legacy zoological taxonomic literature: Processes, products and using the output

Christopher H C Lyal¹

Affiliations

PMID: 26877659
PMCID: PMC4741221
DOI: 10.3897/zookeys.550.9702

Digitising legacy zoological taxonomic literature: Processes, products and using the output

Christopher H C Lyal. Zookeys. 2016.

. 2016 Jan 7:(550):189-206.

doi: 10.3897/zookeys.550.9702. eCollection 2016.

Author

Christopher H C Lyal¹

Affiliation

¹ Life Sciences Department, The Natural History Museum, Cromwell Road, London SW7 5BD, UK.

PMID: 26877659
PMCID: PMC4741221
DOI: 10.3897/zookeys.550.9702

Abstract

By digitising legacy taxonomic literature using XML mark-up the contents become accessible to other taxonomic and nomenclatural information systems. Appropriate schemas need to be interoperable with other sectorial schemas, atomise to appropriate content elements and carry appropriate metadata to, for example, enable algorithmic assessment of availability of a name under the Code. Legacy (and new) literature delivered in this fashion will become part of a global taxonomic resource from which users can extract tailored content to meet their particular needs, be they nomenclatural, taxonomic, faunistic or other. To date, most digitisation of taxonomic literature has led to a more or less simple digital copy of a paper original - the output of the many efforts has effectively been an electronic copy of a traditional library. While this has increased accessibility of publications through internet access, the means by which many scientific papers are indexed and located is much the same as with traditional libraries. OCR and born-digital papers allow use of web search engines to locate instances of taxon names and other terms, but OCR efficiency in recognising taxonomic names is still relatively poor, people's ability to use search engines effectively is mixed, and many papers cannot be searched directly. Instead of building digital analogues of traditional publications, we should consider what properties we require of future taxonomic information access. Ideally the content of each new digital publication should be accessible in the context of all previous published data, and the user able to retrieve nomenclatural, taxonomic and other data / information in the form required without having to scan all of the original papers and extract target content manually. This opens the door to dynamic linking of new content with extant systems: automatic population and updating of taxonomic catalogues, ZooBank and faunal lists, all descriptions of a taxon and its children instantly accessible with a single search, comparison of classifications used in different publications, and so on. A means to do this is through marking up content into XML, and the more atomised the mark-up the greater the possibilities for data retrieval and integration. Mark-up requires XML that accommodates the required content elements and is interoperable with other XML schemas, and there are now several written to do this, particularly TaxPub, taxonX and taXMLit, the last of these being the most atomised. We now need to automate this process as far as possible. Manual and automatic data and information retrieval is demonstrated by projects such as INOTAXA and Plazi. As we move to creating and using taxonomic products through the power of the internet, we need to ensure the output, while satisfying in its production the requirements of the Code, is fit for purpose in the future.

Keywords: XML; botany; digitisation; legacy literature; nomenclature; taxonomy; zoology.

PubMed Disclaimer

Figures

**Figure 1.**
Outline workflow to acquire, put into a suitable format, retrieve and utilize legacy literature.

See this image and copyright information in PMC

Cited by

A common registration-to-publication automated pipeline for nomenclatural acts for higher plants (International Plant Names Index, IPNI), fungi (Index Fungorum, MycoBank) and animals (ZooBank).
Penev L, Paton A, Nicolson N, Kirk P, Pyle RL, Whitton R, Georgiev T, Barker C, Hopkins C, Robert V, Biserkov J, Stoev P. Penev L, et al. Zookeys. 2016 Jan 7;(550):233-46. doi: 10.3897/zookeys.550.9551. eCollection 2016. Zookeys. 2016. PMID: 26877662 Free PMC article.
The List of Available Names (LAN): A new generation for stable taxonomic names in zoology?
Alonso-Zarazaga MA, Fautin DG, Michel E. Alonso-Zarazaga MA, et al. Zookeys. 2016 Jan 7;(550):225-32. doi: 10.3897/zookeys.550.10043. eCollection 2016. Zookeys. 2016. PMID: 26877661 Free PMC article.
Reinforcing the foundations of ornithological nomenclature: Filling the gaps in Sherborn's and Richmond's historical legacy of bibliographic exploration.
Dickinson EC. Dickinson EC. Zookeys. 2016 Jan 7;(550):107-34. doi: 10.3897/zookeys.550.10170. eCollection 2016. Zookeys. 2016. PMID: 26877655 Free PMC article.
Unlocking Index Animalium: From paper slips to bytes and bits.
Pilsk SC, Kalfatovic MR, Richard JM. Pilsk SC, et al. Zookeys. 2016 Jan 7;(550):153-71. doi: 10.3897/zookeys.550.9673. eCollection 2016. Zookeys. 2016. PMID: 26877657 Free PMC article.

References

1. Agosti D, Egloff W. (2009) Taxonomic information exchange and copyright: the Plazi Approach. BMC Research Notes 2: . doi: 10.1186/1756-0500-2-53 - DOI - PMC - PubMed
1. Agosti D, Klingenberg C, Sautter G, Johnson N, Stephenson C, Catapano T. (2007) Why not let the computer save you time by reading the taxonomic papers for you? Biológico (São Paulo) 69(suplemento 2): 545–548. http://hdl.handle.net/10199/15441
1. Akella L, Norton CN, Miller H. (2012) NetiNeti: discovery of scientific names from text using machine learning methods. BMC Bioinformatics 13(1): doi: 10.1186/1471-2105-13-211 - DOI - PMC - PubMed
1. Blagoderov V, Brake I, Georgiev T, Penev L, Roberts D, Rycroft S, Scott B, Agosti D, Catapano T, Smith VS. (2010) Streamlining taxonomic publication: a working example with Scratchpads and ZooKeys. ZooKeys 50: 17–28. doi: 10.3897/zookeys.50.539 - DOI - PMC - PubMed
1. Curry GB, Connor RCH. (2008) Automated extraction of data from text using an XML parser: An earth science example using fossil descriptions. Geosphere 4(1): 159–169. doi: 10.1130/GES00140.1 - DOI

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Digitising legacy zoological taxonomic literature: Processes, products and using the output

Affiliation

Digitising legacy zoological taxonomic literature: Processes, products and using the output

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials