Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb;184(2):429-37.
doi: 10.1534/genetics.109.109736. Epub 2009 Nov 23.

Coalescent simulation of intracodon recombination

Affiliations

Coalescent simulation of intracodon recombination

Miguel Arenas et al. Genetics. 2010 Feb.

Abstract

The coalescent with recombination is a very useful tool in molecular population genetics. Under this framework, genealogies often represent the evolution of the substitution unit, and because of this, the few coalescent algorithms implemented for the simulation of coding sequences force recombination to occur only between codons. However, it is clear that recombination is expected to occur most often within codons. Here we have developed an algorithm that can evolve coding sequences under an ancestral recombination graph that represents the genealogies at each nucleotide site, thereby allowing for intracodon recombination. The algorithm is a modification of Hudson's coalescent in which, in addition to keeping track of events occurring in the ancestral material that reaches the sample, we need to keep track of events occurring in ancestral material that does not reach the sample but that is produced by intracodon recombination. We are able to show that at typical substitution rates the number of nonsynonymous changes induced by intracodon recombination is small and that intracodon recombination does not generally result in inflated estimates of the overall nonsynonymous/synonymous substitution ratio (omega). On the other hand, recombination can bias the estimation of omega at particular codons, resulting in apparent rate variation among sites and in the spurious identification of positively selected sites. Importantly, in this case, allowing for variable synonymous rates across sites greatly reduces the false-positive rate and recovers statistical power. Finally, coalescent simulations with intracodon recombination could be used to better represent the evolution of nuclear coding genes or fast-evolving pathogens such as HIV-1.We have implemented this algorithm in a computer program called NetRecodon, freely available at http://darwin.uvigo.es.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
Generation of ACGs for a coding sequence with three codons. (a) ARG for the whole sequence. Note that the GMRCA of the sample is younger than the root. Inside each node we can see the “sampled” ancestral material (open blocks), the “unsampled” ancestral material (shaded blocks), and the non-ancestral material (dotted lines). Vertical lines across the segments indicate recombination breakpoints. Three recombination breakpoints occur in the ARG: after the first and second positions of the first codon and between the second and third codon. The two intracodon recombination events result in a reticulated ACG for codon 1 (b), while for codons 2 (c) and 3 (d), the ACG are binary trees.
F<sc>igure</sc> 2.—
Figure 2.—
An example of codon evolution along the ACG. Open and shaded circles correspond to coalescence and parental nodes, respectively. (a) Starting from the GMRCA, the codon is evolved between nodes according to the probabilities specified by the codon model and the branch length. (b) The process then encounters a parental node, and because the other parental node has not been assigned a codon yet, it waits there. (c) The algorithm continues its recursion toward the present. (d) The process encounters a parental node, and because the other parental node has already been assigned a codon, (e) it combines the two codons according to the recombination breakpoint. (f) Finally, the resulting recombinant codon (ACT) is evolved.
F<sc>igure</sc> 3.—
Figure 3.—
Performance of the likelihood-ratio test for homogeneous selection pressure across sites in the presence of recombination. Solid and darkly shaded bars indicate the M0 and M1 LRT false-positive rate when data were simulated without/with intracodon recombination, respectively. Lightly shaded and open bars correspond to the power of the LRT for the same two scenarios.
F<sc>igure</sc> 4.—
Figure 4.—
Performance of the FEL estimator of ω (per site) in the presence of recombination. Data were simulated under a M8 model. Solid and darkly shaded bars indicate the FPR when data were simulated without/with intracodon recombination, respectively. Lightly shaded and open bars correspond to the power for the same two scenarios. (Top) FPR and power per replicate for FEL-1R (2000 replicates). (Bottom) FPR and power per replicate for FEL-2R (200 replicates). Sites identified as PSSs were those with ω > 1 and a P-value < 0.05. Error bars indicate 95% confidence intervals per replicate.

References

    1. Anisimova, M., R. Nielsen and Z. Yang, 2003. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164 1229–1236. - PMC - PubMed
    1. Arenas, M., and D. Posada, 2007. Recodon: coalescent simulation of coding DNA sequences with recombination, migration and demography. BMC Bioinformatics 8 458. - PMC - PubMed
    1. Beaumont, M. A., W. Zhang and D. J. Balding, 2002. Approximate Bayesian computation in population genetics. Genetics 162 2025–2035. - PMC - PubMed
    1. Bustamante, C. D., A. Fledel-Alon, S. Williamson, R. Nielsen, M. T. Hubisz et al., 2005. Natural selection on protein-coding genes in the human genome. Nature 437 1153–1157. - PubMed
    1. Carvajal-Rodriguez, A., K. A. Crandall and D. Posada, 2006. Recombination estimation under complex evolutionary models with the coalescent composite-likelihood method. Mol. Biol. Evol. 23 817–827. - PMC - PubMed

Publication types