Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 6;6(1):107-20.
doi: 10.1534/g3.115.024208.

The Genomic Signature of Population Reconnection Following Isolation: From Theory to HIV

Affiliations

The Genomic Signature of Population Reconnection Following Isolation: From Theory to HIV

Nicolas Alcala et al. G3 (Bethesda). .

Abstract

Ease of worldwide travel provides increased opportunities for organisms not only to colonize new environments but also to encounter related but diverged populations. Such events of reconnection and secondary contact of previously isolated populations are widely observed at different time scales. For example, during the quaternary glaciation, sea water level fluctuations caused temporal isolation of populations, often to be followed by secondary contact. At shorter time scales, population isolation and reconnection of viruses are commonly observed, and such events are often associated with epidemics and pandemics. Here, using coalescent theory and simulations, we describe the temporal impact of population reconnection after isolation on nucleotide differences and the site frequency spectrum, as well as common summary statistics of DNA variation. We identify robust genomic signatures of population reconnection after isolation. We utilize our development to infer the recent evolutionary history of human immunodeficiency virus 1 (HIV-1) in Asia and South America, successfully retrieving the successive HIV subtype colonization events in these regions. Our analysis reveals that divergent HIV-1 subtype populations are currently admixing in these regions, suggesting that HIV-1 may be undergoing a process of homogenization, contrary to popular belief.

Keywords: HIV; admixture; coalescent; migration; site frequency spectrum.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Signature of reconnection of previously isolated populations on the distribution of pairwise nucleotide differences as a function of time since the reconnection event Treco (columns), (A) within-πw and between-population πb, (B–C) total π pairwise nucleotide differences. In panel A, we consider a sequence of size G, with G larger than the number of segregating sites without recombination; plots in panel A are computed using equation A.5. In panels B and C, we consider a sequence of 9719 bp (length of the reference sequence for HIV-1, HXB1) with various mean numbers of recombination events per generation, R=1 and R=10. Nucleotide differences are considered from a sample of 16 sequences and each plot represents the mean distribution over 2000 replicate simulations. Parameters are d=3 populations, duration of the isolation period Tiso=6, scaled migration rate M=5, scaled mutation rate θ=100.
Figure 2
Figure 2
Signature of a reconnection event between previously isolated populations on (A) the total and (B) joint site frequency spectrum (SFS), as a function of the number of generations since the reconnection event, Treco. (A) The SFS representation of Nawa and Tajima (2008) is used; the expected value of a single panmictic population at equilibrium is given by a straight line (dashed line), and deviations from a straight line indicate an excess (in blue) or deficit (in red) of variants at a given frequency. Parameters are d=3 populations, n=16 sampled genes per population (see Figures B and C in File S1 for alternative sampling schemes), with θ=100. Means are over 2000 replicates.
Figure 3
Figure 3
Signatures of HIV-1 subtype reconnection events on the Site Frequency Spectrum (SFS). Results are presented for China (left major panel) and for South America (right major panel). Two time points of HIV-1 sequences sampled are considered in China (Table B in File S1) and in South America (Table C in File S1) (see method section ‘HIV-1 genome analysis’). The total (pooled) SFS is presented in the first row following representation of Nawa and Tajima (2008), the joint SFSs of each pairwise group of HIV subtypes are presented in the second, third, and fourth row and, in the fifth row, the Principal Component Analyses (PCAs) are presented. Subtype J is used as an outgroup (see Figure K in File S1, results for alternative outgroups). Original subtypes (subtypes found early in the epidemics in Africa; at the corners) are used to define a triangle in the PCA space. The position of the HIV-1 genomes on the straight edges of the triangle suggests strong admixture between subtypes. For the total and joint SFS, we consider the mean value of 500 random samples of sequences in each subtypes cluster (see Figure J in File S1 for the different clusters). A sample of 16 sequences per subtype cluster is considered in China, and of eight sequences per subtype cluster in South America. The significance of the trends in the dynamics is tested using the optimal test statistic TΩ derived in the SI (Figure L in File S1).
Figure 4
Figure 4
(A–B) Proportion of bimodality in pairwise nucleotide differences (between populations) detected in the genomes and (C–D) distribution of total pairwise nucleotide differences (all populations) of HIV-1 sequences for different time points in China (see Table B in File S1) and South America (see Table B in File S1). As expected under genomic admixture (Figure 1), the proportion of bimodality detected in the pairwise nucleotide differences between populations in HIV-1 genomes increased, and the variance of modes significantly increased with time in China (p<0.01, two-sided Bartlett test). We also see a (nonsignificant) trend of temporal changes in South America.
Figure 5
Figure 5
Projection of worldwide distributed HIV-1 sequences from subtypes B, C, CRF01_AE and F1, and URF and CRF recombinant forms between these subtypes. 1646 genome sequences (see Table A in File S1). (A) The first two axes of the principal component analysis (PC1 and PC2), and (B) the second and third axes of the principal component analysis (PC2 and PC3) defined by pure subtypes B, C, CRF01_AE and F1 (see method section ‘HIV-1 genome analysis’). The percentage of variance explained by each PC is indicated between parentheses. The position of a HIV-1 genome sequence along the axes of the triangle formed by three pure subtypes reflect admixture (proportion) between the subtypes (Ma and Amos 2012). Standard country codes are used: Argentina (AR), Brazil (BR), Botswana (BW), China (CN), Cyprus (CY), Spain (ES), Japan (JP), Thailand (TH), USA (US), Vietnam (VN), South Africa (ZA). Misrepresented sequences [squared cosine of the PC plane with the observation <0.05(Abdi and Williams 2010)] were excluded.

References

    1. Abdi H., Williams L. J., 2010. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2: 433–459.
    1. Achaz G., 2009. Frequency spectrum neutrality tests: one for all and all for one. Genetics 183: 249–258. - PMC - PubMed
    1. Aguilée R., Claessen D., Lambert A., 2013. Adaptive radiation driven by the interplay of eco-evolutionary and landscape dynamics. Evolution 67: 1291–1306. - PubMed
    1. Alcala N., Vuilleumier S., 2014. Turnover and accumulation of genetic diversity across large time-scale cycles of isolation and connection of populations. Proc. Biol. Sci. 281: 20141369. - PMC - PubMed
    1. Alcala N., Streit D., Goudet J., Vuilleumier S., 2013. Peak and persistent excess of genetic diversity following an abrupt migration increase. Genetics 193: 953–971. - PMC - PubMed

Publication types

LinkOut - more resources