Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 27;13(1):61.
doi: 10.3390/genes13010061.

HIV Protease and Integrase Empirical Substitution Models of Evolution: Protein-Specific Models Outperform Generalist Models

Affiliations

HIV Protease and Integrase Empirical Substitution Models of Evolution: Protein-Specific Models Outperform Generalist Models

Roberto Del Amparo et al. Genes (Basel). .

Abstract

Diverse phylogenetic methods require a substitution model of evolution that should mimic, as accurately as possible, the real substitution process. At the protein level, empirical substitution models have traditionally been based on a large number of different proteins from particular taxonomic levels. However, these models assume that all of the proteins of a taxonomic level evolve under the same substitution patterns. We believe that this assumption is highly unrealistic and should be relaxed by considering protein-specific substitution models that account for protein-specific selection processes. In order to test this hypothesis, we inferred and evaluated four new empirical substitution models for the protease and integrase of HIV and other viruses. We found that these models more accurately fit, compared with any of the currently available empirical substitution models, the evolutionary process of these proteins. We conclude that evolutionary inferences from protein sequences are more accurate if they are based on protein-specific substitution models rather than taxonomic-specific (generalist) substitution models. We also present four new empirical substitution models of protein evolution that could be useful for phylogenetic inferences of viral protease and integrase.

Keywords: HIV; phylogenetic reconstruction; protein evolution; substitution model of protein evolution; viral integrase; viral protease.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Pipeline for the inference and evaluation of the empirical substitution models of PR and IN evolution. The input protein sequences were aligned and cleaned (removing duplicate sequences and uninformative sites). Next, the resulting multiple-sequence alignment (MSA) was split into two datasets: a method dataset (for the inference of the substitution model, including most of the sequences) and a test dataset (for the evaluation of the substitution model). Indeed, the method dataset was split into 10 local method datasets (due to computational limitations), and we inferred a local (partition) substitution model for each one. The resulting local substitution models were averaged to obtain a global substitution model. Finally, we calculated the AIC and BIC scores for the global substitution model and other currently available empirical substitution models in order to evaluate them, considering the likelihood of every model with the test dataset.
Figure 2
Figure 2
Comparison of HIVpr and HIVb empirical substitution models concerning their relative substitution rates. The plot displays the exchangeability matrix of the relative substitution rates among amino acids for the HIVpr (developed in this study, black circles) and HIVb (the best-fitting substitution model in the set of currently available substitution models, red circles) empirical substitution models of evolution. This plot provides an illustrative comparison between the cited models; the specific parameter values of the HIVpr substitution model are presented in Table S1.
Figure 3
Figure 3
Likelihood-based evaluation of the HIVpr, VIRpr and currently available best-fitting substitution models. For the HIV PR (left plots) and viral PR (right plots) test datasets, the plots show the AIC (top plots) and BIC (bottom plots) scores obtained with the HIVpr and VIRpr substitution models inferred in this study and the top five currently available best-fitting substitution models with the corresponding test dataset. In all of the cases, the models developed in this study produced AIC and BIC scores (black bars) significantly lower than the currently available best-fitting substitution models (p-values = 0.00013 and 0.00014 for HIVpr and VIRpr, respectively and illustrated with * in the plots).

Similar articles

Cited by

References

    1. Arenas M. Trends in Substitution Models of Molecular Evolution. Front Genet. 2015;6:319. doi: 10.3389/fgene.2015.00319. - DOI - PMC - PubMed
    1. Yutin N., Puigbò P., Koonin E.V., Wolf Y.I. Phylogenomics of Prokaryotic Ribosomal Proteins. PLoS ONE. 2012;7:e36972. doi: 10.1371/journal.pone.0036972. - DOI - PMC - PubMed
    1. Shi M., Lin X.-D., Chen X., Tian J.-H., Chen L.-J., Li K., Wang W., Eden J.-S., Shen J.-J., Liu L., et al. The Evolutionary History of Vertebrate RNA Viruses. Nature. 2018;556:197–202. doi: 10.1038/s41586-018-0012-7. - DOI - PubMed
    1. Furukawa R., Toma W., Yamazaki K., Akanuma S. Ancestral Sequence Reconstruction Produces Thermally Stable Enzymes with Mesophilic Enzyme-like Catalytic Properties. Sci. Rep. 2020;10:15493. doi: 10.1038/s41598-020-72418-4. - DOI - PMC - PubMed
    1. Arenas M., Bastolla U. ProtASR2: Ancestral Reconstruction of Protein Sequences Accounting for Folding Stability. Methods Ecol. Evol. 2020;11:248–257. doi: 10.1111/2041-210X.13341. - DOI

Publication types