Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Sep 1:7:398.
doi: 10.1186/1471-2105-7-398.

PhyloPat: phylogenetic pattern analysis of eukaryotic genes

Affiliations

PhyloPat: phylogenetic pattern analysis of eukaryotic genes

Tim Hulsen et al. BMC Bioinformatics. .

Abstract

Background: Phylogenetic patterns show the presence or absence of certain genes or proteins in a set of species. They can also be used to determine sets of genes or proteins that occur only in certain evolutionary branches. Phylogenetic patterns analysis has routinely been applied to protein databases such as COG and OrthoMCL, but not upon gene databases. Here we present a tool named PhyloPat which allows the complete Ensembl gene database to be queried using phylogenetic patterns.

Description: PhyloPat is an easy-to-use webserver, which can be used to query the orthologies of all complete genomes within the EnsMart database using phylogenetic patterns. This enables the determination of sets of genes that occur only in certain evolutionary branches or even single species. We found in total 446,825 genes and 3,164,088 orthologous relationships within the EnsMart v40 database. We used a single linkage clustering algorithm to create 147,922 phylogenetic lineages, using every one of the orthologies provided by Ensembl. PhyloPat provides the possibility of querying with either binary phylogenetic patterns (created by checkboxes) or regular expressions. Specific branches of a phylogenetic tree of the 21 included species can be selected to create a branch-specific phylogenetic pattern. Users can also input a list of Ensembl or EMBL IDs to check which phylogenetic lineage any gene belongs to. The output can be saved in HTML, Excel or plain text format for further analysis. A link to the FatiGO web interface has been incorporated in the HTML output, creating easy access to functional information. Finally, lists of omnipresent, polypresent and oligopresent genes have been included.

Conclusion: PhyloPat is the first tool to combine complete genome information with phylogenetic pattern querying. Since we used the orthologies generated by the accurate pipeline of Ensembl, the obtained phylogenetic lineages are reliable. The completeness and reliability of these phylogenetic lineages will further increase with the addition of newly found orthologous relationships within each new Ensembl release.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Phylogenetic tree of all species present in PhyloPat. This is the unrooted NCBI Taxonomy tree of all species available in Ensembl and PhyloPat. The numbers are the order in which the species are shown on the PhyloPat results pages. A phylogram version of this tree is available through the website.
Figure 2
Figure 2
The PhyloPat database scheme. The database scheme shows all four tables used in the application. Table names are in bold, primary keys are in italic. Links between fields are shown with arrows. The left side of each column shows the field names, the right side shows the field types.
Figure 3
Figure 3
The PhyloPat web interface (Pattern Search tab). The web interface has the menu on the left and the input/results page on the right. On the pattern search page, the user can generate a phylogenetic pattern by clicking a radio button for each species. 1 = present, * = present/absent, 0 = absent. The buttons directly below put all 21 species on the corresponding mode. MySQL regular expressions offer the possibility of advanced querying. The user can choose to show any number of lineages and choose the output format: HTML, Excel or plain text.
Figure 4
Figure 4
Gene Ontology annotations of 1) omnipresent and 2) all human genes. The left side shows the Gene Ontology annotations for all 2,185 human genes in omnipresent lineages. The right side shows the Gene Ontology annotations for all 31,718 human genes, used as a reference set. Lines are placed between equal annotations for easy comparisons between the left and the right side. a.) 6th level GO Biological Processes. b.) 6th level GO Molecular Functions. c.) 6th level GO Cellular Components.

Similar articles

Cited by

References

    1. Natale DA, Galperin MY, Tatusov RL, Koonin EV. Using the COG database to improve gene recognition in complete genomes. Genetica. 2000;108:9–17. doi: 10.1023/A:1004031323748. - DOI - PubMed
    1. Reichard K, Kaufmann M. EPPS: mining the COG database by an extended phylogenetic patterns search. Bioinformatics. 2003;19:784–785. doi: 10.1093/bioinformatics/btg089. - DOI - PubMed
    1. Chen F, Mackey AJ, Stoeckert CJJ, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006;34:D363–8. doi: 10.1093/nar/gkj123. - DOI - PMC - PubMed
    1. Dehal PS, Boore JL. A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database. BMC Bioinformatics. 2006;7:201. doi: 10.1186/1471-2105-7-201. - DOI - PMC - PubMed
    1. Dufayard JF, Duret L, Penel S, Gouy M, Rechenmann F, Perriere G. Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases. Bioinformatics. 2005;21:2596–2603. doi: 10.1093/bioinformatics/bti325. - DOI - PubMed

Publication types

LinkOut - more resources