Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 10;71(5):1159-1177.
doi: 10.1093/sysbio/syac009.

Full-Likelihood Genomic Analysis Clarifies a Complex History of Species Divergence and Introgression: The Example of the erato-sara Group of Heliconius Butterflies

Affiliations

Full-Likelihood Genomic Analysis Clarifies a Complex History of Species Divergence and Introgression: The Example of the erato-sara Group of Heliconius Butterflies

Yuttapong Thawornwattana et al. Syst Biol. .

Abstract

Introgressive hybridization plays a key role in adaptive evolution and species diversification in many groups of species. However, frequent hybridization and gene flow between species make estimation of the species phylogeny and key population parameters challenging. Here, we show that by accounting for phasing and using full-likelihood methods, introgression histories and population parameters can be estimated reliably from whole-genome sequence data. We employ the multispecies coalescent (MSC) model with and without gene flow to infer the species phylogeny and cross-species introgression events using genomic data from six members of the erato-sara clade of Heliconius butterflies. The methods naturally accommodate random fluctuations in genealogical history across the genome due to deep coalescence. To avoid heterozygote phasing errors in haploid sequences commonly produced by genome assembly methods, we process and compile unphased diploid sequence alignments and use analytical methods to average over uncertainties in heterozygote phase resolution. There is robust evidence for introgression across the genome, both among distantly related species deep in the phylogeny and between sister species in shallow parts of the tree. We obtain chromosome-specific estimates of key population parameters such as introgression directions, times and probabilities, as well as species divergence times and population sizes for modern and ancestral species. We confirm ancestral gene flow between the sara clade and an ancestral population of Heliconius telesiphe, a likely hybrid speciation origin for Heliconius hecalesia, and gene flow between the sister species Heliconius erato and Heliconius himera. Inferred introgression among ancestral species also explains the history of two chromosomal inversions deep in the phylogeny of the group. This study illustrates how a full-likelihood approach based on the MSC makes it possible to extract rich historical information of species divergence and gene flow from genomic data. [3s; bpp; gene flow; Heliconius; hybrid speciation; introgression; inversion; multispecies coalescent].

PubMed Disclaimer

Figures

Figure 1
Figure 1
The genotype-calling error rate for a given base-calling error (formula image) and read depth (formula image) when the true genotype is a) a homozygote for the reference allele (GT formula image) or b) a heterozygote (GT formula image). Note that the genotype-calling error does not decrease monotonically with the increase in formula image when formula image is fixed, or with the reduction in formula image when formula image is fixed. The calculation is performed using a C program written by Z.Y. that implements the ML method of Li (2011) and calculates its error rate, available at https://github.com/abacus-gene/genotypecall. Figure S1 of the Supplementary material available on Dryad differs only in the use of a logarithmic scale for the formula image-axis.
Figure 2
Figure 2
Posterior probabilities of species trees from bpp analysis of a) blocks of 100 loci and b) blocks of 200 loci across the genome (Table S4 of the Supplementary material available on Dryad) under the MSC model without gene flow. The height of each small colored bar represents posterior probability and ranges from 0 to 1. Trees i–x in the legend are MAP trees in at least one block. Tree and bar colors match those in Edelman et al. (2019) (their Fig. 2). Tree xi appeared as one of the top eight trees in the sliding-window analysis of Edelman et al. (2019) but not in our analysis. Inversion regions 2b and 15b on chromosomes 2 and 15 were analyzed separately. Era: H. erato, Him: H. himera, Sia: H. hecalesia, Tel: H. telesiphe, Dem: H. demeter, Sar: H. sara.
Figure 3
Figure 3
Maximum likelihood estimates of the migration rates (formula imageNm) under the IM model from 3s using all 28,204 coding and 17,428 noncoding loci in the autosomes (left column) and 1348 coding and 574 noncoding loci in the Z chromosome (right column) for each pair of species. The donor and recipient species of gene flow are given in the formula image- and formula image-axis, respectively. Heliconiusmelpomene was used as an outgroup. Only significant migration rate estimates (by likelihood ratio test at formula image) larger than 0.001 are shown. See Figure S3 of the Supplementary material available on Dryad and Table S8 of the Supplementary material available on Dryad for estimates of other parameters.
Figure 4
Figure 4
a) The introgression (MSci) model, proposed based on the bpp species tree estimation under MSC and 3s analysis under the IM model, involving two unidirectional introgression events (s formula image d with introgression probability formula image, and tc1 formula image c with probability formula image and two BDI events (e formula image h with probabilities formula image and formula image, and ds2 formula image tc2 with probabilities formula image and formula image. The model involves 34 parameters: 5 species divergence times and 4 introgression times (formula image), 19 population sizes (formula image), and 6 introgression probabilities. To identify parameters (see Fig. S4 of the Supplementary material available on Dryad), internal nodes are labeled with lowercase letters, for example, e is the parent node of Era, eh is the parent node of e and h, etc. Each branch corresponds to a population and is referred to using the label for its daughter node, for example, branch e-Era is labeled branch Era and has population size formula image, and branch eh-e is labeled branch e and has formula image. Branch lengths in the tree represent posterior means of divergence/introgression times in the bpp analysis of the 6,030 noncoding loci in chromosome 1. Estimates for other chromosomes or for coding loci are presented in Figure 5. b) Posterior means (dots) and 95formula image HPD intervals (bars) of the six introgression probabilities (formula image) under the MSci model of (a) obtained from bpp analysis of the 25 chromosomal regions (see Figure 5 for the number of loci). Estimates for other parameters are in Figure S4 of the Supplementary material available on Dryad. Inversion regions 2b and 15b on chromosomes 2 and 15 were analyzed separately, and there was an alternative set of posterior estimates resulting from within-model unidentifiability; see Figure 6.
Figure 5
Figure 5
Estimated introgression history in each chromosomal region obtained from bpp analysis under the MSci model (Fig. 4a) using a) noncoding and b) coding loci. Intensity of the horizontal edges represents the posterior means of the six introgression probabilities (see Fig. 4b), while the y-axis represents the nine divergence/introgression times in the expected number of mutations per site. A full list of posterior estimates and 95formula image HPD intervals, including 19 population size parameters, is in Table S9 of the Supplementary material available on Dryad.
Figure 6
Figure 6
Within-model unidentifiability and possible introgression histories for inversion regions a) 2b and b) 15b, showing estimated divergence times from bpp analysis under the MSci model (Fig. 4a). For each region, there are two competing scenarios of the introgression history that fit the data equally well, resulting in unidentifiability of certain model parameters. Plus (formula image) represents the derived inverted orientation while minus (–) is the ancestral noninverted orientation. Species with uncertain inversion orientation are marked with `?’. Node symbols and colors indicate matching nodes and times between the two scenarios. The estimates were based on posterior means from noncoding loci. Gray band indicates an ancestral population within which the derived inverted orientation (formula image) may have arisen from the noninverted orientation (–). Note that parameters associated with nodes reached by very few sequences (indicated as gray branches) were expected to be poorly estimated, with a wide posterior interval. For example, in Scenario 1 for 2b, formula image, and formula image were poorly estimated (Fig. S4 of the Supplementary material available on Dryad and Table S9 of the Supplementary material available on Dryad).

Similar articles

Cited by

References

    1. Andermann T., Fernandes A.M., Olsson U., Töpel M., Pfeil B., Oxelman B., Aleixo A., Faircloth B.C., Antonelli A.. 2019. Allele phasing greatly improves the phylogenetic utility of ultraconserved elements. Syst. Biol. 68(1):32–46. - PMC - PubMed
    1. Barton N.H. 2006. Evolutionary biology: how did the human species form? Curr. Biol. 16(16):R647–R650. - PubMed
    1. Bates H.W. 1862. Contributions to an insect fauna of the Amazon Valley. Lepidoptera: Heliconinæ. Trans. Linn. Soc. Lond. 23:495–566.
    1. Beltrán M., Jiggins C.D., Brower A.V.Z., Bermingham E., Mallet J.. 2007. Do pollen feeding, pupal-mating and larval gregariousness have a single origin in Heliconius butterflies? Inferences from multilocus DNA sequence data. Biol. J. Linn. Soc. 92(2):221–239.
    1. Beltrán M., Jiggins C.D., Bull V., Linares M., Mallet J., McMillan W.O., Bermingham E.. 2002. Phylogenetic discordance at the species boundary: comparative gene genealogies among rapidly radiating Heliconius butterflies. Mol. Biol. Evol. 19(12):2176–2190. - PubMed

Publication types