Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 4;40(4):msad041.
doi: 10.1093/molbev/msad041.

Beginner's Guide on the Use of PAML to Detect Positive Selection

Affiliations

Beginner's Guide on the Use of PAML to Detect Positive Selection

Sandra Álvarez-Carretero et al. Mol Biol Evol. .

Abstract

The CODEML program in the PAML package has been widely used to analyze protein-coding gene sequences to estimate the synonymous and nonsynonymous rates (dS and dN) and to detect positive Darwinian selection driving protein evolution. For users not familiar with molecular evolutionary analysis, the program is known to have a steep learning curve. Here, we provide a step-by-step protocol to illustrate the commonly used tests available in the program, including the branch models, the site models, and the branch-site models, which can be used to detect positive selection driving adaptive protein evolution affecting particular lineages of the species phylogeny, affecting a subset of amino acid residues in the protein, and affecting a subset of sites along prespecified lineages, respectively. A data set of the myxovirus (Mx) genes from ten mammal and two bird species is used as an example. We discuss a new feature in CODEML that allows users to perform positive selection tests for multiple genes for the same set of taxa, as is common in modern genome-sequencing projects. The PAML package is distributed at https://github.com/abacus-gene/paml under the GNU license, with support provided at its discussion site (https://groups.google.com/g/pamlsoftware). Data files used in this protocol are available at https://github.com/abacus-gene/paml-tutorial.

Keywords: d N/dS; PAML; adaptive evolution; nonsynonymous substitutions; positive selection; synonymous substitutions.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Phylogenetic tree for ten mammals and two bird species reconstructed by maximum-likelihood (ML) under the GTR + G model with RAxML v8.2.10 using the Mx gene sequences. The best-scoring ML tree is unrooted, but the root is shown for clarity. The chicken and duck branches were identified as the foreground branches in the branch and branch-site tests of positive selection. When the model assumes the same evolutionary process for the two branches around the root (e.g., if both are assumed to be the background branches in the branch or branch-site models), the root of the tree will be identifiable. Then, the two branches should be merged into one (branches shown in red), with one branch length estimated. In other words, the unrooted tree should be used. However, if the two branches are assumed to evolve differently in the model (e.g., if one branch is a background branch and the other is labeled as foreground in the branch or branch-site models), the root of the tree is then identifiable, and the rooted tree should be used. All silhouettes are from https://www.phylopic.org/.
Fig. 2.
Fig. 2.
Example CODEML control file. The first three blocks specify the paths to the input and output files (lines 1–3), how much information is to be printed on the screen or in the output file (lines 5–6), and the data type of the sequence alignment (lines 8–11). The fourth block defines the evolutionary model. In this example, a homogeneous ω across both branches and sites is selected (model=0, NSsites=0). Several models are available for accounting for unequal codon usage with CodonFreq = 0 (Fequal), 1 (F1 × 4), 2 (F3 × 4), or 3 (Fcodon). Here, we use the mutation-selection model with observed codon frequencies used as estimates (CodonFreq=7, estFreq=0) (Yang and Nielsen 2008). This model explicitly accounts for the mutational bias and selection affecting codon usage, and is preferable over the other models concerning codon usage (Yang and Nielsen 2008). We estimate ω from the data and so choose fix_omega=0, with the initial value omega=0.5. Lastly, the evolutionary rate is allowed to vary among lineages on the tree (i.e., clock=0).
Fig. 3.
Fig. 3.
Illustration of four different types of models implemented in CODEML. (A) homogeneous evolutionary pressure throughout the history of the gene (M0: one ratio, with one ω ratio for all sites and branches, specified as model=0 and NSsites=0); (B) heterogeneous pressure across codons (site models: model=0 and NSsites=1, 2, 7, 8, etc.); (C) heterogeneous pressure across branches of a tree but homogeneous across codons (branch model: model=2 and NSsites=0); and (D) heterogeneous pressure across sites and branches (branch-site model: model=2 and NSsites=2). See table 2 for more details on CODEML specifications.
None
None
None
None
None

Similar articles

Cited by

References

    1. Anisimova M, Kosiol C. 2009. Investigating protein-coding sequence evolution with probabilistic codon substitution models. Mol Biol Evol. 26:255–271. - PubMed
    1. Anisimova M, Nielsen R, Yang Z. 2003. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164:1229–1236. - PMC - PubMed
    1. Anisimova M, Yang Z. 2007. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol. 24:1219–1228. - PubMed
    1. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B. 57:289–300.
    1. Benjamini Y, Hochberg Y. 2000. On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educat Behav Stat. 25:83.

Publication types