Automated comparative auditing of NCIT genomic roles using NCBI
- PMID: 18486558
- PMCID: PMC2630966
- DOI: 10.1016/j.jbi.2008.03.010
Automated comparative auditing of NCIT genomic roles using NCBI
Abstract
Biomedical research has identified many human genes and various knowledge about them. The National Cancer Institute Thesaurus (NCIT) represents such knowledge as concepts and roles (relationships). Due to the rapid advances in this field, it is to be expected that the NCIT's Gene hierarchy will contain role errors. A comparative methodology to audit the Gene hierarchy with the use of the National Center for Biotechnology Information's (NCBI's) Entrez Gene database is presented. The two knowledge sources are accessed via a pair of Web crawlers to ensure up-to-date data. Our algorithms then compare the knowledge gathered from each, identify discrepancies that represent probable errors, and suggest corrective actions. The primary focus is on two kinds of gene-roles: (1) the chromosomal locations of genes, and (2) the biological processes in which genes play a role. Regarding chromosomal locations, the discrepancies revealed are striking and systematic, suggesting a structurally common origin. In regard to the biological processes, difficulties arise because genes frequently play roles in multiple processes, and processes may have many designations (such as synonymous terms). Our algorithms make use of the roles defined in the NCIT Biological Process hierarchy to uncover many probable gene-role errors in the NCIT. These results show that automated comparative auditing is a promising technique that can identify a large number of probable errors and corrections for them in a terminological genomic knowledge repository, thus facilitating its overall maintenance.
Figures
Similar articles
-
Detecting role errors in the gene hierarchy of the NCI Thesaurus.Cancer Inform. 2008;6:293-313. doi: 10.4137/cin.s440. Cancer Inform. 2008. PMID: 19221606 Free PMC article.
-
Gene: a gene-centered information resource at NCBI.Nucleic Acids Res. 2015 Jan;43(Database issue):D36-42. doi: 10.1093/nar/gku1055. Epub 2014 Oct 29. Nucleic Acids Res. 2015. PMID: 25355515 Free PMC article.
-
Auditing as part of the terminology design life cycle.J Am Med Inform Assoc. 2006 Nov-Dec;13(6):676-90. doi: 10.1197/jamia.M2036. Epub 2006 Aug 23. J Am Med Inform Assoc. 2006. PMID: 16929044 Free PMC article.
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
-
Database resources of the National Center for Biotechnology Information.Nucleic Acids Res. 2021 Jan 8;49(D1):D10-D17. doi: 10.1093/nar/gkaa892. Nucleic Acids Res. 2021. PMID: 33095870 Free PMC article. Review.
Cited by
-
Relationship auditing of the FMA ontology.J Biomed Inform. 2009 Jun;42(3):550-7. doi: 10.1016/j.jbi.2009.01.001. J Biomed Inform. 2009. PMID: 19475727 Free PMC article.
-
Extended Analysis of Topological-Pattern-Based Ontology Enrichment.Proceedings (IEEE Int Conf Bioinformatics Biomed). 2018 Dec;2018:1641-1648. doi: 10.1109/BIBM.2018.8621564. Epub 2019 Jan 24. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2018. PMID: 30854243 Free PMC article.
-
Preliminary Analysis of Difficulty of Importing Pattern-Based Concepts into the National Cancer Institute Thesaurus.Stud Health Technol Inform. 2016;228:389-93. Stud Health Technol Inform. 2016. PMID: 27577410 Free PMC article.
-
Auditing associative relations across two knowledge sources.J Biomed Inform. 2009 Jun;42(3):426-39. doi: 10.1016/j.jbi.2009.01.004. J Biomed Inform. 2009. PMID: 19475724 Free PMC article.
-
Topological-Pattern-Based Recommendation of UMLS Concepts for National Cancer Institute Thesaurus.AMIA Annu Symp Proc. 2017 Feb 10;2016:618-627. eCollection 2016. AMIA Annu Symp Proc. 2017. PMID: 28269858 Free PMC article.
References
-
- The International Human Genome Mapping Consortium. A Physical Map of the Human Genome. Nature. 2001;409:934–941. - PubMed
-
- Venter JC, et al. The Sequence of the Human Genome. Science. 2001;291:1304–1351. - PubMed
-
- Collins FS, Patrinos A, Jordan E, Chakravarti A, Gesteland R, Walters L, et al. New Goals for the U.S. Human Genome Project: 1998–2003. Science. 1998;282:682–689. - PubMed
-
- Lin JH. Divining and Altering the Future: Implications from the Human Genome Project. Science. 1998;282:1532. - PubMed
-
- Karanjawala ZE, Collins FS. Genetics in the Context of Medical Practice. JAMA. 1998;280(17):1533–1534. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources