Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jan 1;34(Database issue):D363-8.
doi: 10.1093/nar/gkj123.

OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups

Affiliations

OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups

Feng Chen et al. Nucleic Acids Res. .

Abstract

The OrthoMCL database (http://orthomcl.cbil.upenn.edu) houses ortholog group predictions for 55 species, including 16 bacterial and 4 archaeal genomes representing phylogenetically diverse lineages, and most currently available complete eukaryotic genomes: 24 unikonts (12 animals, 9 fungi, microsporidium, Dictyostelium, Entamoeba), 4 plants/algae and 7 apicomplexan parasites. OrthoMCL software was used to cluster proteins based on sequence similarity, using an all-against-all BLAST search of each species' proteome, followed by normalization of inter-species differences, and Markov clustering. A total of 511,797 proteins (81.6% of the total dataset) were clustered into 70,388 ortholog groups. The ortholog database may be queried based on protein or group accession numbers, keyword descriptions or BLAST similarity. Ortholog groups exhibiting specific phyletic patterns may also be identified, using either a graphical interface or a text-based Phyletic Pattern Expression grammar. Information for ortholog groups includes the phyletic profile, the list of member proteins and a multiple sequence alignment, a statistical summary and graphical view of similarities, and a graphical representation of domain architecture. OrthoMCL software, the entire FASTA dataset employed and clustering results are available for download. OrthoMCL-DB provides a centralized warehouse for orthology prediction among multiple species, and will be updated and expanded as additional genome sequence data become available.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A phylogeny was constructed for 55 sequenced genomes based on orthologous gene content. See Table 1 for species abbreviations. The tree was drawn using Phylodendron ().
Figure 2
Figure 2
An OrthoMCL group is a cluster of sequences from multiple species predicted to be orthologous to each other. (A) Ortholog group summary information, including group size (# Sequences, # Taxa), BLAST statistics (% Match Pairs, Average E-value, Average % Coverage, Average % Identity) and the phyletic pattern profile for all species in the dataset is shown. Rows in the phyletic pattern profile table represent bacteria, archaea, single-cellular eukaryotes and multi-cellular eukaryotes (plants and animals); each box represents a single species, with black or white background denoting presence or absence in the ortholog group, and the number of protein sequences found in the ortholog group listed. Mouse-over expands abbreviations to provide the full species name. Links at top left access a tabular list of information for each member of the ortholog group (including links to the reference database), a graphical representation of Pfam domain architecture (B), a BioLayout graph of pairwise similarity scores (C), a MUSCLE multiple sequence alignment (D) and a sequence retrieval option. The example shown illustrates a ‘prolipoprotein diacylglyceryl transferase’, whose distribution is restricted to the bacteria.

References

    1. Tatusov R.L., Koonin E.V., Lipman D.J. A genomic perspective on protein families. Science. 1997;278:631–637. - PubMed
    1. Sonnhammer E.L., Koonin E.V. Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 2002;18:619–620. - PubMed
    1. Li L., Stoeckert C.J., Jr, Roos D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. - PMC - PubMed
    1. Remm M., Storm C.E., Sonnhammer E.L. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 2001;314:1041–1052. - PubMed
    1. Van Dongen S. The Netherlands: University of Utrecht; 2000. Graph clustering by flow simulation. PhD Thesis.

Publication types