Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jun 6;2(6):e503.
doi: 10.1371/journal.pone.0000503.

HIV-specific probabilistic models of protein evolution

Affiliations

HIV-specific probabilistic models of protein evolution

David C Nickle et al. PLoS One. .

Abstract

Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1-the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Rate matrices for different substitution models.
All matrices are scaled to one expected substitution per unit time per site. Shading of the cells reflects the respective magnitude of the rate, with darker shades corresponding to increasingly higher rates. Substitutions which involve a single nucleotide are marked with a circle. The four diagonal blocks represent similarity classes (conservative substitutions) according to the Stanfel scale.
Figure 2
Figure 2. Inferred substitution rates.
Rates are classified by whether or not a substitution involves single or multiple nucleotide changes, and by how they affect various properties of the residue being substituted. HIV-Bm model is plotted in the top row and HIV-Wm model-in the bottom row.
Figure 3
Figure 3. Model clustering using the Total Variation Metric at the evolutionary times equivalent to 5%, 25% and 100% sequence divergences.
Figure 4
Figure 4. Maximum likelihood trees inferred with two different amino acid models from a sample of HIV-1 env V3 sequences with a known transmission history (Leitner et al. 1996).
HIV-Bm found the tree that is congruent with the true history of the sequences. Scale bars are in expected amino acid substitutions/site.
Figure 5
Figure 5. A comparison of BLOSUM and HIV-1 similarity scoring matrices with expected 62% sequence identity.
Green = positive score; red = negative score; brightness = magnitude of score.

References

    1. Hasegawa M, Kishino H, Yano TA. Dating of the human ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985;22:160–174. - PubMed
    1. Muse SV, Gaut BS. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994;11:715–724. - PubMed
    1. Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–736. - PubMed
    1. Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. In: Dayhoff MO, editor. Atlas of protein sequence and structure: National Biomedical Research Foundation. Washington D.C.: 1978. pp. 345–352.
    1. Henikoff S, Henikoff JG. Amino Acid Substitution Matrices from Protein Blocks. PNAS. 1992;89:10915–10919. - PMC - PubMed

Publication types

MeSH terms

Substances