Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Dec;41(6):904-13.
doi: 10.1016/j.jbi.2008.03.010. Epub 2008 Mar 28.

Automated comparative auditing of NCIT genomic roles using NCBI

Affiliations
Comparative Study

Automated comparative auditing of NCIT genomic roles using NCBI

Barry Cohen et al. J Biomed Inform. 2008 Dec.

Abstract

Biomedical research has identified many human genes and various knowledge about them. The National Cancer Institute Thesaurus (NCIT) represents such knowledge as concepts and roles (relationships). Due to the rapid advances in this field, it is to be expected that the NCIT's Gene hierarchy will contain role errors. A comparative methodology to audit the Gene hierarchy with the use of the National Center for Biotechnology Information's (NCBI's) Entrez Gene database is presented. The two knowledge sources are accessed via a pair of Web crawlers to ensure up-to-date data. Our algorithms then compare the knowledge gathered from each, identify discrepancies that represent probable errors, and suggest corrective actions. The primary focus is on two kinds of gene-roles: (1) the chromosomal locations of genes, and (2) the biological processes in which genes play a role. Regarding chromosomal locations, the discrepancies revealed are striking and systematic, suggesting a structurally common origin. In regard to the biological processes, difficulties arise because genes frequently play roles in multiple processes, and processes may have many designations (such as synonymous terms). Our algorithms make use of the roles defined in the NCIT Biological Process hierarchy to uncover many probable gene-role errors in the NCIT. These results show that automated comparative auditing is a promising technique that can identify a large number of probable errors and corrections for them in a terminological genomic knowledge repository, thus facilitating its overall maintenance.

PubMed Disclaimer

Figures

Figure 1
Figure 1
In the NCIT, a role defines a relation from one concept to another (target) concept.
Figure 2
Figure 2
Three cases of auditing Biological Process role targets of a gene in the NCIT.
Figure 3
Figure 3
The more refined concept replaces the more general concept as the target of the role.

Similar articles

Cited by

References

    1. The International Human Genome Mapping Consortium. A Physical Map of the Human Genome. Nature. 2001;409:934–941. - PubMed
    1. Venter JC, et al. The Sequence of the Human Genome. Science. 2001;291:1304–1351. - PubMed
    1. Collins FS, Patrinos A, Jordan E, Chakravarti A, Gesteland R, Walters L, et al. New Goals for the U.S. Human Genome Project: 1998–2003. Science. 1998;282:682–689. - PubMed
    1. Lin JH. Divining and Altering the Future: Implications from the Human Genome Project. Science. 1998;282:1532. - PubMed
    1. Karanjawala ZE, Collins FS. Genetics in the Context of Medical Practice. JAMA. 1998;280(17):1533–1534. - PubMed

Publication types

LinkOut - more resources