Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jul 29;12(7):e1004976.
doi: 10.1371/journal.pcbi.1004976. eCollection 2016 Jul.

PhyloBot: A Web Portal for Automated Phylogenetics, Ancestral Sequence Reconstruction, and Exploration of Mutational Trajectories

Affiliations

PhyloBot: A Web Portal for Automated Phylogenetics, Ancestral Sequence Reconstruction, and Exploration of Mutational Trajectories

Victor Hanson-Smith et al. PLoS Comput Biol. .

Abstract

The method of phylogenetic ancestral sequence reconstruction is a powerful approach for studying evolutionary relationships among protein sequence, structure, and function. In particular, this approach allows investigators to (1) reconstruct and "resurrect" (that is, synthesize in vivo or in vitro) extinct proteins to study how they differ from modern proteins, (2) identify key amino acid changes that, over evolutionary timescales, have altered the function of the protein, and (3) order historical events in the evolution of protein function. Widespread use of this approach has been slow among molecular biologists, in part because the methods require significant computational expertise. Here we present PhyloBot, a web-based software tool that makes ancestral sequence reconstruction easy. Designed for non-experts, it integrates all the necessary software into a single user interface. Additionally, PhyloBot provides interactive tools to explore evolutionary trajectories between ancestors, enabling the rapid generation of hypotheses that can be tested using genetic or biochemical approaches. Early versions of this software were used in previous studies to discover genetic mechanisms underlying the functions of diverse protein families, including V-ATPase ion pumps, DNA-binding transcription regulators, and serine/threonine protein kinases. PhyloBot runs in a web browser, and is available at the following URL: http://www.phylobot.com. The software is implemented in Python using the Django web framework, and runs on elastic cloud computing resources from Amazon Web Services. Users can create and submit jobs on our free server (at the URL listed above), or use our open-source code to launch their own PhyloBot server.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Summary of PhyloBot automated pipeline.
A user begins by uploading a collection of orthologous protein sequences in a FASTA-formatted text file. PhyloBot reads the sequence collection and launches its automated analysis pipeline, which includes sequence alignment, phylogenetic model-fitting, tests of branch support, ancestral sequence reconstruction, and prediction of functional genetics. Upon completion, the results can be viewed in a web browser.
Fig 2
Fig 2. Screenshots from the PhyloBot web portal.
(A) The front page of the portal provides a control panel to create new jobs and to check the status of existing jobs. In this image, a user has five jobs; three of them are 100% complete and the other two are in progress. (B) A user can view detailed status for every job they create. The status page provides controls to start, stop, reset, and delete the job, in addition to displaying the job’s settings and the job’s current status.
Fig 3
Fig 3. Example of alignment robustness analysis.
In this simple example, orthologous amino acid sequences from five species were aligned using three different methods for multiple sequence alignment: Muscle, MSAProbs, and MAFFT. (A) PhyloBot maps the aligned position of every character across all alignments. Shown in red is the character map for the amino acids aligned into site 3 of the Muscle alignment. In the MSAProbs sequence alignment, these same residues are split across sites 3 and 4. In the MAFFT alignment, these residues are split across sites 3, 4 and 5. (B) PhyloBot displays the character map as pie charts expressing site identity relative to the Muscle alignment. PhyloBot will also show these maps relative to MSAProbs and MAFFT alignments.
Fig 4
Fig 4. Example of ancestral node robustness analysis.
In this small example with protein sequences from five species, maximum likelihood phylogenies were inferred using four different evolutionary models (JTT+GAMMA, JTT+CAT, LG+GAMMA, and LG+CAT) based on three different sequence alignment methods (Muscle, MSAProbs, and MAFFT). The resulting ML phylogenies disagree in their topologies, and an ancestral node in one tree may not exist in other trees. For example, shown in red is the phylogenetic node corresponding to the most-recent ancestor of H. sapiens, M. musculus, and G. gallus, with X. tropicalis and T. teleost as the outgroup. This ancestral node is not inferred to exist when using some combinations of models and methods. Specifically, the alternate phylogenies support an evolutionary hypothesis in which the sequences from G. gallus and X. tropicalis are sister to each other. PhyloBot gathers this information about all reconstructed ancestral nodes, in order to assess the extent to which an ancestor’s existence is robust across different models and methods.
Fig 5
Fig 5. Screenshots from the PhyloBot ancestral library viewer.
The images shown come from the Ancestral Library computed for the CMGC protein family [31]. (A) The library viewer displays an interactive tree for exploring reconstructed protein ancestors. Users select the maximum likelihood tree based on the alignment method and evolutionary model, and then click on ancestral nodes within that tree. (B) PhyloBot gathers summary statistics about every ancestral node. Shown here is the support summary for ancestral Node 401 in the CMGC family, reconstructed using msaprobs and PROTCATLG. The histogram bins the sequence sites of Node 401 according to their amino acid probability support. In this case, a majority of sites have support of 0.9 or greater. The line graph expresses the probability of the maximum likelihood amino acid residue, along with the second-best and third-best reconstructed residues; the line graph is a quick way to visually determine which protein domains were reconstructed with strong support. In this example, there is an unstructured region in the C-terminus that was reconstructed with low support. (C) PhyloBot shows details about every site in every reconstructed ancestor. Shown here is the probability support by site for Node 401 in CMGC. Users can optionally map this data to extant sequences. For example, here a user selected Homo sapiens CDK6. In the table the first column displays the sequence site in the MSAProbs alignment, the second column expresses the site number and best amino acid state in the reconstructed ancestor Node 401, the third column expresses the site number and amino acid state in Homo sapiens CDK6, the fourth column expresses the full probability distribution of all amino acid states reconstructed at that site in Node 401.

Similar articles

Cited by

References

    1. Ortlund E, Bridgham JT, Redinbo MR, Thornton JW. Crystal structure of an ancient protein: evolution by conformational epistasis. Science 2007, 317, 1544–8. - PMC - PubMed
    1. Bridgham J, Ortlund E, and Thornton JW. Evolution of a New Function by Degenerative Mutation in Cephalochordate Steroid Receptors. PLoS Genetics 2009, 4(9). - PMC - PubMed
    1. Baker CB, Hanson-Smith V, and Johnson AD. Following gene duplication, paralog interference constrains transcriptional circuit evolution. Science 2013, 342, 104–8. 10.1126/science.1240810 - DOI - PMC - PubMed
    1. Howard C, Hanson-Smith V, Kennedy KJ, Miller C, Lou HJ, Johnson AJ, et al. Ancestral resurrection reveals evolutionary mechanisms of kinase plasticity. eLife 2014, 3:e04126 - PMC - PubMed
    1. McKeown A, Bridgham JT, Anderson DW, Murphy MN, Ortlund EA, Thornton JW. Evolution of DNA specificity in a transcription factor family produced a new gene regulatory module. Cell 2014, 159, 58–68. 10.1016/j.cell.2014.09.003 - DOI - PMC - PubMed

Publication types

LinkOut - more resources