Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Sep 17:4:33.
doi: 10.1186/1471-2148-4-33.

Reconstruction of ancestral protein sequences and its applications

Affiliations
Comparative Study

Reconstruction of ancestral protein sequences and its applications

Wei Cai et al. BMC Evol Biol. .

Abstract

Background: Modern-day proteins were selected during long evolutionary history as descendants of ancient life forms. In silico reconstruction of such ancestral protein sequences facilitates our understanding of evolutionary processes, protein classification and biological function. Additionally, reconstructed ancestral protein sequences could serve to fill in sequence space thus aiding remote homology inference.

Results: We developed ANCESCON, a package for distance-based phylogenetic inference and reconstruction of ancestral protein sequences that takes into account the observed variation of evolutionary rates between positions that more precisely describes the evolution of protein families. To improve the accuracy of evolutionary distance estimation and ancestral sequence reconstruction, two approaches are proposed to estimate position-specific evolutionary rates. Comparisons show that at large evolutionary distances our method gives more accurate ancestral sequence reconstruction than PAML, PHYLIP and PAUP*. We apply the reconstructed ancestral sequences to homology inference and functional site prediction. We show that the usage of hypothetical ancestors together with the present day sequences improves profile-based sequence similarity searches; and that ancestral sequence reconstruction methods can be used to predict positions with functional specificity.

Conclusions: As a computational tool to reconstruct ancestral protein sequences from a given multiple sequence alignment, ANCESCON shows high accuracy in tests and helps detection of remote homologs and prediction of functional sites. ANCESCON is freely available for non-commercial use. Pre-compiled versions for several platforms can be downloaded from ftp://iole.swmed.edu/pub/ANCESCON/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
a) Correlation between average αAB and average observed α. b) Correlation between average αML and average observed α. αAB is Alignment-Based rate factor solely depending on the given alignment. αML is rate factor estimated by maximum likelihood method, which requires an alignment and evolutionary tree inferred from the alignment. The protein family used here is the PDZ domain.
Figure 2
Figure 2
The tree used to test ancestral sequence reconstruction. This is an arbitrarily selected evolutionary tree. Evolutionary distances are shown to scale.
Figure 3
Figure 3
Comparison of pairwise distances between the rebuilt tree and original tree. a) distance estimation assuming no rate variation among sites; b) distance estimation with αAB. The rebuilt tree is inferred from the alignment that is generated by evolutionary simulation performed on the original tree. The original tree is arbitrarily selected.
Figure 4
Figure 4
a) Correlation between the average probability of "the reconstructed amino acid" and the fraction of correct predictions. b) Correlation between the fraction of correct predictions and average αAB at each site. The protein family used here is the PDZ domain. Red filled points are sites with incorrect reconstruction.
Figure 5
Figure 5
Comparison of "BEST", "SECOND BEST", "SHUFFLE" and "RANDOM" methods in the number of new homologs detected when compared with the benchmark experiment. The methods are defined in "Methods" section. The blue portion of the bar shows the number of true positives. The red portion of the bar shows the number of the false positives.
Figure 6
Figure 6
Mapping top 10 predictions by ANCESCON to PDZ domain (PDB ID: 1be9) [50]. The color code scheme: ligand is shown in green and the predicted functional residues are shown in red.
Figure 7
Figure 7
A partial alignment of the N-terminal part of adenylyl kinases. Sites colored in red are our predictions that are within 5Å from the ligand. Sites colored in orange are our predictions more than 5Å apart from the ligand.
Figure 8
Figure 8
The evolutionary tree for the adenylyl kinase family generated by "Weighbor". The first cutting layer is shown. Evolutionary distances are shown to scale.
Figure 9
Figure 9
Mapping top 10 predictions by ANCESCON to adenylyl kinase domain (PDB ID: 1aky) [47]. The color code scheme: ligand is shown in green and the predicted functional residues are shown in red.
Figure 10
Figure 10
An evolutionary tree topology. Nodes C, D, E and F represent given protein sequences, while nodes A and B represent ancestral protein sequences, i.e. unknown sequences. dYZ represents the evolutionary distance between nodes Y and Z.
Figure 11
Figure 11
An example showing the different cutting layers in a rooted tree. dr is the average distance from the root to all leaf nodes. Nodes i and j are neighboring cutting nodes.

References

    1. Fitch WM. Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool. 1971;20:406–416.
    1. Hartigan JA. Minimum evolution fits to a given tree. Biometrics. 1973;29:53–65.
    1. Yang Z, Kumar S, Nei M. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics. 1995;141:1641–1650. - PMC - PubMed
    1. Koshi JM, Goldstein RA. Probabilistic reconstruction of ancestral protein sequences. J Mol Evol. 1996;42:313–320. - PubMed
    1. Pupko T, Pe'er I, Shamir R, Graur D. A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol Biol Evol. 2000;17:890–896. - PubMed

Publication types