Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 Apr 24:3:12.
doi: 10.1186/1471-2105-3-12.

CoreGenes: a computational tool for identifying and cataloging "core" genes in a set of small genomes

Affiliations
Comparative Study

CoreGenes: a computational tool for identifying and cataloging "core" genes in a set of small genomes

Nikhat Zafar et al. BMC Bioinformatics. .

Abstract

Background: Improvements in DNA sequencing technology and methodology have led to the rapid expansion of databases comprising DNA sequence, gene and genome data. Lower operational costs and heightened interest resulting from initial intriguing novel discoveries from genomics are also contributing to the accumulation of these data sets. A major challenge is to analyze and to mine data from these databases, especially whole genomes. There is a need for computational tools that look globally at genomes for data mining.

Results: CoreGenes is a global JAVA-based interactive data mining tool that identifies and catalogs a "core" set of genes from two to five small whole genomes simultaneously. CoreGenes performs hierarchical and iterative BLASTP analyses using one genome as a reference and another as a query. Subsequent query genomes are compared against each newly generated "consensus." These iterations lead to a matrix comprising related genes from this set of genomes, e. g., viruses, mitochondria and chloroplasts. Currently the software is limited to small genomes on the order of 330 kilobases or less.

Conclusion: A computational tool CoreGenes has been developed to analyze small whole genomes globally. BLAST score-related and putatively essential "core" gene data are displayed as a table with links to GenBank for further data on the genes of interest. This web resource is available at http://pumpkins.ib3.gmu.edu:8080/CoreGenes or http://www.bif.atcc.org/CoreGenes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of CoreGenes analysis Up to five genomes can be entered into the GUI and analyzed per session.
Figure 2
Figure 2
Screenshot of a CoreGenes session GenBank accession numbers are entered into each "sequence" field. Two to five genomes may be entered to extract the consensus set of "core" genes.
Figure 3
Figure 3
Screenshot of a CoreGenes analysis The analysis generates a two-dimensional color-coded plot (top panel) displaying the core genes contained in a set of chloroplast genomes: A. thaliana, N. tabacum, O. sativa and C. vulgaris. The reference genome is the x-axis. Each genome is represented vertically above the reference by a different colored dot, indicated independently at the side of the graph. This data is also presented as a table (bottom panel) displaying the "core" genes contained in a set of chloroplast genomes: A. thaliana, N. tabacum, O. sativa and C. vulgaris. This data include hyperlinks to the NCBI database. A BLASTP threshold score is set at the default of "75" for this session.

References

    1. Helt GA, Lewis S, Loraine AE, Rubin GM. JAVA-based tools for genomic data visualization. Genome Res. 1998;8:291–305. - PMC - PubMed
    1. Dicks J. Graphical tools for comparative genome analysis. Yeast. 2000;17:6–15. doi: 10.1002/(SICI)1097-0061(200004)17:1<6::AID-YEA15>3.0.CO;2-V. - DOI - PMC - PubMed
    1. Upton C, Hogg D, Perrin D, Boone M, Harris NL. Viral genome organizer: a system for analyzing complete viral genomes. Virus Res. 2000;70:55–64. doi: 10.1016/S0168-1702(00)00210-0. - DOI - PubMed
    1. Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y. Predicting function: from genes to genomes and back. J Mol Biol. 1998;283:707–725. doi: 10.1006/jmbi.1998.2144. - DOI - PubMed
    1. Jareborg N, Durbin R. Alfresco – a workbench for comparative genomic sequence analysis. Genome Res. 2000;10:1148–1157. doi: 10.1101/gr.10.8.1148. - DOI - PMC - PubMed

Publication types

Substances