Review

. 2010 May 25:11:278.

doi: 10.1186/1471-2105-11-278.

Semantic annotation of morphological descriptions: an overall strategy

Hong Cui¹

Affiliations

PMID: 20500882
PMCID: PMC2887808
DOI: 10.1186/1471-2105-11-278

Review

Semantic annotation of morphological descriptions: an overall strategy

Hong Cui. BMC Bioinformatics. 2010.

. 2010 May 25:11:278.

doi: 10.1186/1471-2105-11-278.

Author

Hong Cui¹

Affiliation

¹ School of Information Resources and Library Science, University of Arizona, 1515 E, First Street, Tucson, Arizona 85719, USA. hongcui@email.arizona.edu

PMID: 20500882
PMCID: PMC2887808
DOI: 10.1186/1471-2105-11-278

Abstract

Background: Large volumes of morphological descriptions of whole organisms have been created as print or electronic text in a human-readable format. Converting the descriptions into computer- readable formats gives a new life to the valuable knowledge on biodiversity. Research in this area started 20 years ago, yet not sufficient progress has been made to produce an automated system that requires only minimal human intervention but works on descriptions of various plant and animal groups. This paper attempts to examine the hindering factors by identifying the mismatches between existing research and the characteristics of morphological descriptions.

Results: This paper reviews the techniques that have been used for automated annotation, reports exploratory results on characteristics of morphological descriptions as a genre, and identifies challenges facing automated annotation systems. Based on these criteria, the paper proposes an overall strategy for converting descriptions of various taxon groups with the least human effort.

Conclusions: A combined unsupervised and supervised machine learning strategy is needed to construct domain ontologies and lexicons and to ultimately achieve automated semantic annotation of morphological descriptions. Further, we suggest that each effort in creating a new description or annotating an individual description collection should be shared and contribute to the "biodiversity information commons" for the Semantic Web. This cannot be done without a sound strategy and a close partnership between and among information scientists and biologists.

PubMed Disclaimer

Figures

**Figure 1**
**An annotated morphological description**. "<>" enclosed text is a tag. Bold font represents paragraph level annotation, bold and italic clause level annotation, and italic character level annotation. Annotation produced by an annotation system created for FNA by the author.

**Figure 2**
**Two regular expression patterns**. The first (Soderland, 1999) is for extracting bedroom number and rent from apartment rental ads. The pattern extracts the digit before "BR" as the number of bedrooms ($1) and the number after a "$" as the rent ($2). The pattern produces the correct result for Input 1 but a wrong result for Input 2, as $600 was the price for one room, not four rooms. The pattern will not match or extract anything from "1 large BR $500" or "1 master BR $500." The second (Tang & Heidorn, 2007) extracts leaf blade dimension by looking for a range between the words "blade" and "base."

**Figure 3**
**The counts of new domain concepts in Part V of TIP using different sized common word filters**.

**Figure 4**
**The counts of new domain concepts in FNA using different sized common word filters**.

**Figure 5**
**The counts of new domain concepts in FOC using different sized common word filters**.

**Figure 6**
**Parsing trees produced by the Stanford Parser for descriptive sentences**. The first two trees contrast the incorrect parsing of a descriptive sentence in the deviated grammar to the correct parsing of a similar sentence in standard English grammar. The remaining contrasts the incorrect parsing of 3 typical descriptive clauses in the deviated syntax to the correct parsing when the correct Part of Speech (POS) tags were given to the parser. The nodes closest to the words in the parsing trees are the POS tags.

**Figure 7**
**An overall strategy to automated semantic annotation of morphological descriptions of various taxon groups**.

See this image and copyright information in PMC

References

1. Flora of North America Editorial Committee (Eds): Flora of North America. http://www.fna.org/
1. Tang X, Heidorn PB. Using automatically extracted information in species page retrieval. Proceedings of TDWG. 2007. http://www.tdwg.org/proceedings/article/view/195
1. Cui H, Macklin J, Yu C. Application of semantic annotation for quality insurance in biosystematics publishing. Proceedings of the Annual Meeting of American Society of Information Science and Technology 2009 (in CD) 2009.
1. Taylor A. Extracting knowledge from biological descriptions. Proceedings of 2nd International Conference on Building and Sharing Very Large-Scale Knowledge Bases. 1995. pp. 114–119.
1. Abascal R, Sanchenz J. X-tract: Structure extraction from botanical textual descriptions. Proceeding of the String Processing & Information Retrieval Symposium. 1999. pp. 2–7.

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Semantic annotation of morphological descriptions: an overall strategy

Affiliation

Semantic annotation of morphological descriptions: an overall strategy

Author

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases