Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 May 13:2015:bav040.
doi: 10.1093/database/bav040. Print 2015.

Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy

Affiliations

Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy

Wasila Dahdul et al. Database (Oxford). .

Abstract

The diverse phenotypes of living organisms have been described for centuries, and though they may be digitized, they are not readily available in a computable form. Using over 100 morphological studies, the Phenoscape project has demonstrated that by annotating characters with community ontology terms, links between novel species anatomy and the genes that may underlie them can be made. But given the enormity of the legacy literature, how can this largely unexploited wealth of descriptive data be rendered amenable to large-scale computation? To identify the bottlenecks, we quantified the time involved in the major aspects of phenotype curation as we annotated characters from the vertebrate phylogenetic systematics literature. This involves attaching fully computable logical expressions consisting of ontology terms to the descriptions in character-by-taxon matrices. The workflow consists of: (i) data preparation, (ii) phenotype annotation, (iii) ontology development and (iv) curation team discussions and software development feedback. Our results showed that the completion of this work required two person-years by a team of two post-docs, a lead data curator, and students. Manual data preparation required close to 13% of the effort. This part in particular could be reduced substantially with better community data practices, such as depositing fully populated matrices in public repositories. Phenotype annotation required ∼40% of the effort. We are working to make this more efficient with Natural Language Processing tools. Ontology development (40%), however, remains a highly manual task requiring domain (anatomical) expertise and use of specialized software. The large overhead required for data preparation and ontology development contributed to a low annotation rate of approximately two characters per hour, compared with 14 characters per hour when activity was restricted to character annotation. Unlocking the potential of the vast stores of morphological descriptions requires better tools for efficiently processing natural language, and better community practices towards a born-digital morphology. Database URL: http://kb.phenoscape.org

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Workflow for the curation of phenotypic characters from systematic studies.
Figure 2.
Figure 2.
Phenex screenshot of window with the ontology request broker (ORB) pop-up box overlaying panels for characters, states, phenotypes and term information.

References

    1. Deans A.R., Lewis S.E., Huala E., et al. . (2015) Finding our way through phenotypes. PLoS Biol., 13, e1002033. - PMC - PubMed
    1. Mabee P., Balhoff J.P., Dahdul W.M., et al. . (2012) 500,000 fish phenotypes: the new informatics landscape for evolutionary and developmental biology of the vertebrate skeleton. J. Appl. Ichthyol., 28, 300–305. - PMC - PubMed
    1. Mungall C.J., Gkoutos G.V., Smith C.L., et al. . (2010) Integrating phenotype ontologies across multiple species. Genome Biol., 11, R2. - PMC - PubMed
    1. Balhoff J.P., Dahdul W.M., Kothari C.R., et al. . (2010) Phenex: ontological annotation of phenotypic diversity. PLoS One, 5, e10500. - PMC - PubMed
    1. Dahdul W.M., Balhoff J.P., Engeman J., et al. . (2010) Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS One, 5, e10708. - PMC - PubMed

Publication types

LinkOut - more resources