Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jul;38(Web Server issue):W29-34.
doi: 10.1093/nar/gkq298. Epub 2010 Apr 29.

SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction

Affiliations

SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction

Raffi Hagopian et al. Nucleic Acids Res. 2010 Jul.

Abstract

We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Beta adrenergic receptors SATCHMO-JS tree and MSA displayed using the PhyloScope viewer. The PhyloScope viewer allows users to select internal nodes of the tree for examination of the alignments at these nodes, which may reflect different levels of inferred structural similarity across homologs. Columns are colored according to conservation based on BLOSUM62 sum-of-pairs scores (light blue indicates the highest level of conservation, followed by dark blue, grey and uncolored). Clicking on a subtree node restricts the MSA displayed to the sequences descending from that node, and highlights the selected subtree. The SATCHMO algorithm attempts to determine which columns are part of the conserved core structure across all sequences that descend from a node, resulting in some residues being displayed in lowercase (indicating that they are inserted relative to the consensus) at nodes higher in the tree (toward the root) but in uppercase at subtrees nearer the leaves. (A) The SATCHMO-JS tree and MSA corresponding to the root of the tree, where all sequences are selected. The first ∼70 residues of most sequences display in lowercase (indicating insertions relative to the consensus structure) reflecting structural variability over the dataset as a whole in this region. (Coincidentally, the region identified by SATCHMO as conserved across the dataset corresponds to the PFAM 7TM_1 HMM, which matches this region.) (B) The ADRB1 subtree (corresponding to orthologous Beta-1 adrenergic receptors from different species) has been selected by clicking the subtree node. This results in coloring the selected subtree red and displaying the MSA corresponding to sequences descending from that node. Note that many residues that displayed in lowercase in the SATCHMO root-level MSA are now displayed in uppercase, indicating that they are predicted by SATCHMO to be part of the conserve core structure for Beta-1 adrenergic receptors. Examining this subtree MSA shows that ADRB1_XENLA (from Xenopus laevis, African clawed frog) and ADRB1_MEGLA (from Meleagris gallopavo, Common turkey) diverge from mammalian orthologs at the N-terminus.
Figure 2.
Figure 2.
Benchmarking MSA accuracy. Methods used in this comparison include the original SATCHMO, SATCHMO-JS, ClustalW, MUSCLE and MAFFT (MUSCLE and MAFFT each used five iterations refinement). Results are shown on 983 pairs from the PREFAB benchmark dataset, divided into bins based on the percent identity in the reference structural alignment. The Modeler score (Qmodeler) is a measure of the precision of an alignment, while the Developer score (Qdeveloper) is a measure of the recall. For every percent identity bin, either SATCHMO or SATCHMO-JS produces the best overall performance in both Modeler and Developer scores, with SATCHMO-JS generally producing better results than SATCHMO. Over the dataset as a whole, SATCHMO-JS’s improvement relative to other methods tested is statistically significant (P < 0.05 using Wilcoxon paired score signed rank tests) for all scoring functions (including Qcombined and the Cline Shift score, which balance recall and precision) with a single exception: relative to MAFFT, the difference is significant only for the Developer score (P = 1.138e-05). For the Modeler, Qcombined and Cline Shift scores, the P-values are 0.204, 0.093 and 0.157, respectively. See text for additional details.

References

    1. Sjölander K. Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics. 2004;20:170–179. - PubMed
    1. Dobzhansky CT. Nothing in biology makes sense except in the light of evolution. Am. Biol. Teach. 1973;35:125–129.
    1. Sjölander K. Getting started in structural phylogenomics. PLoS Comput. Biol. 2010;6:e1000621. - PMC - PubMed
    1. Datta RS, Meacham C, Samad B, Neyer C, Sjölander K. Berkeley PHOG: PhyloFacts orthology group prediction web server. Nucleic Acids Res. 2009;37:W84–W89. - PMC - PubMed
    1. Gabaldon T. Large-scale assignment of orthology: back to phylogenetics? Genome Biol. 2008;9:235. - PMC - PubMed

Publication types