HIV-specific probabilistic models of protein evolution
- PMID: 17551583
- PMCID: PMC1876811
- DOI: 10.1371/journal.pone.0000503
HIV-specific probabilistic models of protein evolution
Abstract
Comparative sequence analyses, including such fundamental bioinformatics techniques as similarity searching, sequence alignment and phylogenetic inference, have become a mainstay for researchers studying type 1 Human Immunodeficiency Virus (HIV-1) genome structure and evolution. Implicit in comparative analyses is an underlying model of evolution, and the chosen model can significantly affect the results. In general, evolutionary models describe the probabilities of replacing one amino acid character with another over a period of time. Most widely used evolutionary models for protein sequences have been derived from curated alignments of hundreds of proteins, usually based on mammalian genomes. It is unclear to what extent these empirical models are generalizable to a very different organism, such as HIV-1-the most extensively sequenced organism in existence. We developed a maximum likelihood model fitting procedure to a collection of HIV-1 alignments sampled from different viral genes, and inferred two empirical substitution models, suitable for describing between-and within-host evolution. Our procedure pools the information from multiple sequence alignments, and provided software implementation can be run efficiently in parallel on a computer cluster. We describe how the inferred substitution models can be used to generate scoring matrices suitable for alignment and similarity searches. Our models had a consistently superior fit relative to the best existing models and to parameter-rich data-driven models when benchmarked on independent HIV-1 alignments, demonstrating evolutionary biases in amino-acid substitution that are unique to HIV, and that are not captured by the existing models. The scoring matrices derived from the models showed a marked difference from common amino-acid scoring matrices. The use of an appropriate evolutionary model recovered a known viral transmission history, whereas a poorly chosen model introduced phylogenetic error. We argue that our model derivation procedure is immediately applicable to other organisms with extensive sequence data available, such as Hepatitis C and Influenza A viruses.
Conflict of interest statement
Figures
References
-
- Hasegawa M, Kishino H, Yano TA. Dating of the human ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985;22:160–174. - PubMed
-
- Muse SV, Gaut BS. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994;11:715–724. - PubMed
-
- Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11:725–736. - PubMed
-
- Dayhoff MO, Schwartz RM, Orcutt BC. A model of evolutionary change in proteins. In: Dayhoff MO, editor. Atlas of protein sequence and structure: National Biomedical Research Foundation. Washington D.C.: 1978. pp. 345–352.
Publication types
MeSH terms
Substances
Grants and funding
- P30-AI-27757/AI/NIAID NIH HHS/United States
- R01 AI057167/AI/NIAID NIH HHS/United States
- R01 AI047745/AI/NIAID NIH HHS/United States
- R01-GM66276/GM/NIGMS NIH HHS/United States
- R01 AI054165/AI/NIAID NIH HHS/United States
- AI36214/AI/NIAID NIH HHS/United States
- R21 AI047745/AI/NIAID NIH HHS/United States
- 4R01 AI508894-03/AI/NIAID NIH HHS/United States
- U01 AI043638/AI/NIAID NIH HHS/United States
- R01 AI058894/AI/NIAID NIH HHS/United States
- R56 AI047745/AI/NIAID NIH HHS/United States
- P01 AI057005/AI/NIAID NIH HHS/United States
- P30 AI036214/AI/NIAID NIH HHS/United States
- P30 AI027757/AI/NIAID NIH HHS/United States
- R01 GM066276/GM/NIGMS NIH HHS/United States
- R0154165-04/PHS HHS/United States
- AI57167/AI/NIAID NIH HHS/United States
- AI43638/AI/NIAID NIH HHS/United States
- 5P01 AI057005-04/AI/NIAID NIH HHS/United States
- AI47745/AI/NIAID NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
