Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2009 Jun 16;10 Suppl 6(Suppl 6):S3.
doi: 10.1186/1471-2105-10-S6-S3.

Databases of homologous gene families for comparative genomics

Affiliations
Comparative Study

Databases of homologous gene families for comparative genomics

Simon Penel et al. BMC Bioinformatics. .

Abstract

Background: Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable.

Methods: We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster.

Results: Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at http://pbil.univ-lyon1.fr/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Removal of incompatible HSPs. For each couple of homologous sequences found by BLASTP, HSPs that are incompatible with a global alignment are removed. In this example, segments S1 and S2 are compatible, but segments S3 and S4 are not. They are therefore ignored by further computations on similarity measures which allow one to classify (or not) these two sequences in the same family.
Figure 2
Figure 2
Tree reconciliation between a gene tree G and a species tree S showing different topologies. The result is the reconciled tree R. R is a variation of S, in which duplication nodes have been inserted in order to explain incongruence with G.
Figure 3
Figure 3
Multiple alignments and phylogenetic trees visualization through the PBIL Web interface. In this exemple, the alignment is displayed with the JalView applet and the phylogenetic tree is displayed with the ATV applet.
Figure 4
Figure 4
Three different frames of the FamFetch interface. Frame (a) is an interactive editor that allows users to build any pattern, node by node and leaf by leaf. Here the pattern entered allows to detect families in which an eukaryotic species is placed within a clade of bacterial species. Frame (b) allows to choose between tools to use in the editor. Tools surrounded by dark grey are those that use the gene duplication predictions, and can be avoided if the user does not want to trust this information. Frame (c) is the tree display. In this frame, sequence are displayed using a colour code corresponding to the taxonomy.
Figure 5
Figure 5
Exemple of trees containing anomalous patterns involving eukaryotes and bacteria. A search on the pattern shown in Figure 4 has been performed on HOGENOM release 4, and this search returned a total of 1,304 families. Two trees taken among the 1,304 are shown in this figure. Family HBG082165 (a) corresponds to a conserved hypothetical protein, and it shows a S. cerevisiae sequence among Lactobacillales species. Family HBG459980 (b) corresponds to the 3-phosphoshikimate 1-carboxyvinyltransferase enzyme, and it shows a G. gallus sequence among Proteobacteria species. Values of the aLRT test are given for the internal branches, and only values with a P > 80% are shown.

References

    1. Duret L, Mouchiroud D, Gouy M. HOVERGEN: a database of homologous vertebrate genes. Nucleic Acids Res. 1994;22:2360–2365. - PMC - PubMed
    1. Duret L, Perrière G, Gouy M. HOVERGEN: database and software for comparative analysis of homologous vertebrate genes. In: Letovsky S, editor. Bioinformatics Databases and Systems. Boston: Kluwer Academic Publishers; 1999. pp. 13–29.
    1. Graur D, Duret L, Gouy M. Phylogenetic position of the order Lagomorpha (rabbits, hares and allies) Nature. 1996;379:333–335. - PubMed
    1. Hedges SB, Parker PH, Sibley CG, Kumar S. Continental breakup and the ordinal diversification of birds and mammals. Nature. 1996;381:226–229. - PubMed
    1. Makalowski W, Boguski MS. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc Natl Acad Sci USA. 1998;95:9407–9412. - PMC - PubMed

Publication types

LinkOut - more resources