Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 9:11:e76073.
doi: 10.7554/eLife.76073.

Complex fitness landscape shapes variation in a hyperpolymorphic species

Affiliations

Complex fitness landscape shapes variation in a hyperpolymorphic species

Anastasia V Stolyarova et al. Elife. .

Abstract

It is natural to assume that patterns of genetic variation in hyperpolymorphic species can reveal large-scale properties of the fitness landscape that are hard to detect by studying species with ordinary levels of genetic variation. Here, we study such patterns in a fungus Schizophyllum commune, the most polymorphic species known. Throughout the genome, short-range linkage disequilibrium (LD) caused by attraction of minor alleles is higher between pairs of nonsynonymous than of synonymous variants. This effect is especially pronounced for pairs of sites that are located within the same gene, especially if a large fraction of the gene is covered by haploblocks, genome segments where the gene pool consists of two highly divergent haplotypes, which is a signature of balancing selection. Haploblocks are usually shorter than 1000 nucleotides, and collectively cover about 10% of the S. commune genome. LD tends to be substantially higher for pairs of nonsynonymous variants encoding amino acids that interact within the protein. There is a substantial correlation between LDs at the same pairs of nonsynonymous mutations in the USA and the Russian populations. These patterns indicate that selection in S. commune involves positive epistasis due to compensatory interactions between nonsynonymous alleles. When less polymorphic species are studied, analogous patterns can be detected only through interspecific comparisons.

Keywords: D. melanogaster; epistasis; evolutionary biology; genetic diversity; genetics; genomics; human; linkage disequilibrium; population genetics; schizophyllum commune.

Plain language summary

Changes to DNA known as mutations may alter how the proteins and other components of a cell work, and thus play an important role in allowing living things to evolve new traits and abilities over many generations. Whether a mutation is beneficial or harmful may differ depending on the genetic background of the individual – that is, depending on other mutations present in other positions within the same gene – due to a phenomenon called epistasis. Epistasis is known to affect how various species accumulate differences in their DNA compared to each other over time. For example, a mutation that is rare in humans and known to cause disease may be widespread in other primates because its negative effect is canceled out by another mutation that is standard for these species but absent in humans. However, it remains unclear whether epistasis plays a significant part in shaping genetic differences between individuals of the same species. A type of fungus known as Schizophyllum commune lives on rotting wood and is found across the world. It is one of the most genetically diverse species currently known, so there is a higher chance of pairs of compensatory mutations occurring and persisting for a long time in S. commune than in most other species, providing a unique opportunity to study epistasis. Here, Stolyarova et al. studied two distinct populations of S. commune, one from the USA and one from Russia. The team found that – unlike in humans, flies and other less genetically diverse species – epistasis maintains combinations of mutations in S. commune that individually would be harmful to the fungus but together compensate for each other. For example, pairs of mutations affecting specific molecules known as amino acids – the building blocks of proteins – that physically interact with each other tended to be found together in the same individuals. One potential downside of having pairs of compensatory mutations in the genome is that when the organism reproduces, the process of making sex cells may split up these pairs so that harmful mutations are inherited without their partner mutations. Thus, epistasis may have helped shape the way S. commune and other genetically diverse species have evolved.

PubMed Disclaimer

Conflict of interest statement

AS, TN, EZ, AF, AK, GB No competing interests declared

Figures

Figure 1.
Figure 1.. The efficiency of epistatic selection in populations with different levels of genetic diversity.
(A–C) LD in natural populations for SNPs with MAF >0.05. (A) USA population of S. commune, (B) Zambian population of D. melanogaster, (C) African superpopulation of H. sapiens. Filled areas in (A)-(C) indicate SE of LD calculated for each chromosome or scaffold separately. (D–F) A hyperpolymorphic population (D) may occupy a sizeable chunk of a complex fitness landscape, leading to pervasive positive epistasis, while variation within less polymorphic populations (E and F) is confined to smaller, and approximately linear, portions of the landscape, so that no strong epistasis and LD can emerge. The area of the landscape covered by the population is shown in green.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. The efficiency of epistatic selection in populations with different levels of genetic diversity.
LD between nonsynonymous SNPs is shown in orange, and LD between synonymous SNPs is shown in blue. (A) Russian population of S. commune, (B) European super-population of H. sapiens. Solid lines indicate LD between pairs of SNPs located within the same gene; dashed lines correspond to pairs of SNPs located in different genes. Only SNPs with minor allele frequency > 0.05 are analysed. Filled areas indicate SE of LD calculated for each chromosome (for human) or scaffold (for S. commune) separately.
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. Linkage disequilibrium within and between exons in S. commune.
LD between nonsynonymous SNPs is shown in orange, and LD between synonymous SNPs is shown in blue. Solid lines indicate LD between pairs of SNPs located within the same exon of the gene; dashed lines correspond to pairs of SNPs located in different exons of the gene. (A) USA population of S. commune, (B) RUS population of S. commune. Only SNPs with minor allele frequency > 0.05 are analysed. Filled areas indicate SE of LD calculated for each scaffold separately.
Figure 1—figure supplement 3.
Figure 1—figure supplement 3.. Comparison of LDnonsyn and LDsyn in S. commune populations with exact matching of both MAFs and distance.
For each possible minor allele count and nucleotide distance, the number of corresponding pairs of nonsynonymous variants and LDnonsyn between them is calculated. Then, the same number of synonymous variants on the same nucleotide distance and with the same minor allele count is randomly chosen to calculate LDsyn. Subsampling is performed for 100 times. Filled areas show 95% intervals of LDsyn in the subsamples. (A) All SNPs, (B) SNPs with MAF >0.05.
Figure 1—figure supplement 4.
Figure 1—figure supplement 4.. LD between SNPs with different MAF in S. commune.
LD between nonsynonymous SNPs is shown in orange, and LD between synonymous SNPs is shown in blue. Filled areas indicate SE of LD calculated for each scaffold separately. (A, B) LD between all pairs of SNPs pooled together. Solid lines indicate LD between pairs of SNPs located within the same gene; dashed lines correspond to pairs of SNPs located in different genes. (C, D) Pairs of SNPs split by MAF.
Figure 1—figure supplement 5.
Figure 1—figure supplement 5.. LD between SNPs with different MAF in D. melanogaster.
LD between nonsynonymous SNPs is shown in orange, and LD between synonymous SNPs is shown in blue. Filled areas indicate SE of LD calculated for each chromosome separately. (A) LD between all pairs of SNPs pooled together. Solid lines indicate LD between pairs of SNPs located within the same gene; dashed lines correspond to pairs of SNPs located in different genes. (B) Pairs of SNPs with MAF <0.05 (large scale). (C) Pairs of SNPs split by MAF.
Figure 1—figure supplement 6.
Figure 1—figure supplement 6.. LD between SNPs with different MAF in H. sapiens.
LD between nonsynonymous SNPs is shown in orange, and LD between synonymous SNPs is shown in blue. Filled areas indicate SE of LD calculated for each chromosome separately. (A) LD between all pairs of SNPs pooled together. Solid lines indicate LD between pairs of SNPs located within the same gene; dashed lines correspond to pairs of SNPs located in different genes. (B) Pairs of SNPs with MAF <0.05 (large scale). (C) Pairs of SNPs split by MAF.
Figure 2.
Figure 2.. Excessive LD between physically interacting protein sites.
(A) Within pairs of SNPs that correspond to pairs of amino acids that are colocalized within 10 Å in the protein structure, the LD is elevated between nonsynonymous, but not between synonymous, variants. Dashed lines show the average LD between colocalized sites. Permutations were performed by randomly sampling pairs of non-interacting SNPs while controlling for genetic distance between them, measured in amino acids; pairs of SNPs closer than 5 aa were excluded. (B–D) Examples of proteins with LD patterns matching their three-dimensional structures. Heatmaps show the physical distance between pairs of sites in the protein structure; only positions carrying biallelic SNPs are shown. Black dots correspond to pairs of sites with high LD (>0.9 quantile for the gene). Dashed lines in (c) structure show high LD between physically close SNPs from different segments of high LD. In these examples, LD is calculated in the Russian population of S. commune.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Examples of proteins with LD patterns matching the three-dimensional structure in the RUS population of S. commune.
Heatmaps show the physical distance between pairs of sites in the protein structure; only positions carrying biallelic SNPs are shown. Black dots correspond to pairs of sites with high LD (>0.9 quantile for the gene). Grey regions indicate the exon structure of the genes. (A) cog1523 (5Y1B:A); (B) cog2779 (1SXJ:B); (C) cog5375 (1RGI:G); (D), cog5725 (1TA3:B); (E) cog18092 (4QJY:A); (F) cog7878 (4TYW:A). LD statistics and p-values for each gene are listed in Appendix 3—table 1.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. Examples of proteins with LD patterns matching the three-dimensional structure in the USA population of S. commune.
Heatmaps show the physical distance between pairs of sites in the protein structure; only positions carrying biallelic SNPs are shown. Black dots correspond to pairs of sites with high LD (>0.9 quantile for the gene). Grey regions indicate the exon structure of the genes. (A) cog1536 (6AHR:E); (B) cog5725 (1TA3:B); (C) cog8253 (6F87:A); (D) cog9241 (1YCD:A). LD statistics and p-values for each gene are listed in Appendix 3—table 1.
Figure 3.
Figure 3.. Patterns of linkage disequilibrium in the USA population of S. commune.
(A) Distribution of the fraction of polymorphic sites that carry minor alleles in a genotype within haploblocks. Black line shows the distribution of fraction of minor alleles in genotypes in non-haploblock regions. (B) Distributions of the average MAF within a haploblock for haploblocks with different average values of LD. The average MAF in non-haploblock regions is shown as a horizontal black line for comparison. (C) LD between nonsynonymous and synonymous SNPs within individual genes. Linear regression of LDnonsyn on LDnsyn is shown as the red line. To control for the gene length, only SNPs within 300 nucleotides from each other were analyzed. Genes with fewer than 100 such pairs of SNPs were excluded. (D,E) The positive correlation between pn/ps of the gene and its average LD (D) or the difference between LDnonsyn and LDsyn (E). Here, the data on the USA population of S. commune are shown; similar patterns in the Russian population are shown in Figure 3—figure supplement 4.
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Examples of haploblocks in two populations of S. commune.
The heatmaps show LD between polymorphic SNPs in the same genomic regions in the USA and RUS populations of S. commune. Only biallelic polymorphic sites with minor allele frequency >1 are shown, the number of such sites can differ between populations.
Figure 3—figure supplement 2.
Figure 3—figure supplement 2.. Distribution of haploblock lengths (nt) in the two populations of S. commune.
Figure 3—figure supplement 3.
Figure 3—figure supplement 3.. Example of the S. commune alignment within a haploblock.
Region 3097200–3097500 of scaffold 4 in the USA population of S. commune is shown. The top line shows the consensus sequence based on 34 genotypes; dot indicates match with the consensus.
Figure 3—figure supplement 4.
Figure 3—figure supplement 4.. Patterns of linkage disequilibrium in the RUS population of S. commune.
(A) Bimodal distribution of the fraction of polymorphic sites carrying minor alleles per genome within the haploblocks. Each count corresponds to a genotype within a haploblock. Black line shows the background distribution of minor alleles in the non-haploblock regions. (B) The increased average minor allele frequency within haploblocks as compared to the non-haploblock regions (dashed line, t-test p-value <2e-16). (C) LD between nonsynonymous and synonymous SNPs within single genes. Each dot represents an individual gene. Linear regression of LDnonsyn over LDnsyn is shown as the red line. To control for the gene length, only SNPs within 300 bp from each other were analyzed. Genes with fewer than 100 such pairs of SNPs were excluded. (D,E) The positive correlation between pn/ps of the gene and its average LD (Spearman correlation p-value = 4e-16) (D) or the difference between LDnonsyn and LDsyn (Spearman correlation p-value = 2e-5) (E).
Figure 3—figure supplement 5.
Figure 3—figure supplement 5.. Comparison of LDnonsyn and LDsyn in the genes of S. commune.
(A) The USA population, (B) the RUS population. The genes are stratified by their average LD (the panels) and by the pn/ps. Only pairs of SNPs within 300 bp from each other are analyzed; genes with less than 100 such pairs of nonsynonymous or synonymous SNPs are excluded. Spearman correlation p-values are shown.
Figure 3—figure supplement 6.
Figure 3—figure supplement 6.. The difference between LDnonsyn and LDsyn under pairwise epistasis and balancing selection.
(A) The excess of LDnonsyn over LDsyn under different models of epistasis between two deleterious mutations A → a and B → b without balancing selection and in the presence of negative frequency-dependent selection (NFDS) or associate overdominance (AOD) acting in the linked sites. The height of columns shows fitness of the corresponding genotypes. (+) indicate simulations where the excess of LDnonsyn is reproduced. (B) The average LD in the simulations. (C) The difference between LDnonsyn and LDsyn in the simulations.
Figure 3—figure supplement 7.
Figure 3—figure supplement 7.. Criteria for haploblocks in S. commune.
Red lines show the distribution of LD (r2) in windows of 250 nucleotides in two populations. Black line corresponds to the lognormal distribution with the same mean and variance. The windows with LD higher than the threshold value defined as the intersection point of the two lines (dashed) are attributed to haploblocks.
Figure 4.
Figure 4.. Correlation of LD values between pairs of shared SNPs in the two S. commune populations.
(A) Pairs of SNPs with the same alleles in both sites, (B) pairs of SNPs differing by at least one allele. Asterisks indicate Spearman correlation p-values <0.001.
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Association of LD values between pairs of shared nonsynonymous SNPs encoding the same amino acids in the two S. commune populations.
(A) All pairs of SNPs pooled together. Pair of SNPs is considered to carry different alleles if at least one allele differs in at least one site. (B) Pairs of SNPs stratified by distance between them. Asterisks indicate Spearman correlation p-values <0.01.
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. Association of LD values between pairs of shared SNPs within haploblocks in the two S. commune populations.
(A) Pairs of SNPs with the same major and minor alleles in both sites, (B) pairs of SNPs differing by at least one allele. Asterisks indicate Spearman correlation p-values <0.001.
Appendix 1—figure 1.
Appendix 1—figure 1.. The efficiency of epistasis in populations with different levels of nucleotide diversity.
(A) Under low nucleotide diversity, deleterious mutations (red dots) are unlikely to be compensated. If nucleotide diversity is high, epistatic selection maintains LD between SNPs in interacting sites (blue dots). (B) The probability that a deleterious variant is compensated by another variant within the same individual at the end of the simulation. (C) Increase in the mean fitness of the population caused by epistatic selection maintaining LD between favorable allele combinations. The fitness is plotted relative to that of a population consisting of individuals with uncorrelated alleles at different sites, obtained by permuting alleles among individuals. The efficiency of epistatic selection in maintaining linkage is much higher in genetically variable populations. Asterisks in (C) indicate significant deviation from 0 (Wilcoxon paired test p-value <0.01). Each simulation was repeated between 100 and 10,000 times depending on genetic diversity.
Appendix 2—figure 1.
Appendix 2—figure 1.. Polarized linkage disequilibrium in S. commune.
LD between nonsynonymous SNPs is shown in orange, and LD between synonymous SNPs is shown in blue. Filled areas indicate SE of LD calculated for each scaffold separately.
Appendix 2—figure 2.
Appendix 2—figure 2.. Polarized linkage disequilibrium in D. melanogaster.
LD between nonsynonymous SNPs is shown in orange, and LD between synonymous SNPs is shown in blue. Filled areas indicate SE of LD calculated for each chromosome separately.
Appendix 2—figure 3.
Appendix 2—figure 3.. Polarized linkage disequilibrium in H. sapiens.
LD between nonsynonymous SNPs is shown in orange, and LD between synonymous SNPs is shown in blue. Filled areas indicate SE of LD calculated for each chromosome separately.
Appendix 2—figure 4.
Appendix 2—figure 4.. LDnonsyn and LDsyn in simulations under weak negative selection.
LD between synonymous (blue, selection coefficient Nes = 0) and nonsynonymous (orange, Nes = –1) variants under varying recombination rate. Only SNPs with MAF >0.05 are shown. Simulated haploid population size N=2000, sequence length L=1000 bp.
Appendix 2—figure 5.
Appendix 2—figure 5.. Patterns of LD in simulations under Hill-Robertson interference.
(A) LD between nonsynonymous and synonymous pairs of SNPs split by MAF. (B) LD between all pairs of nonsynonymous and synonymous SNPs pooled together. (A–B) Simulated haploid population size N=2000, sequence length L=1000 bp. Top panels - selection coefficients of all nonsynonymous mutations are equal to –0.005 (Nes = –10); bottom panels - selection coefficients of nonsynonymous mutations are gamma-distributed with parameters rate = 1, scale = 0.005. (C) LD and nucleotide diversity within genes of the USA population of S. commune (each point represents one gene). (D) LD and nucleotide diversity obtained in simulations.
Appendix 3—figure 1.
Appendix 3—figure 1.. Patterns of nucleotide diversity in S. commune.
(A) The fraction of private and shared biallelic SNPs. (B) Within-population nucleotide diversity at different classes of sites (measured as π without Jukes-Cantor correction). (C) The number of monomorphic and polymorphic sites in the multiple whole-genome alignments of S. commune genomes.
Appendix 3—figure 2.
Appendix 3—figure 2.. The reconstructed phylogeny of S. commune.
USA and Russian populations of S. commune are highly divergent while having almost no within-population structure. Genetic distance is measured in nucleotide differences, the phylogeny is reconstructed based on the multiple whole-genome alignment.

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR. A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
    1. Arnold B, Sohail M, Wadsworth C, Corander J, Hanage WP, Sunyaev S, Grad YH. Fine-Scale Haplotype Structure Reveals Strong Signatures of Positive Selection in a Recombining Bacterial Pathogen. Molecular Biology and Evolution. 2020;37:417–428. doi: 10.1093/molbev/msz225. - DOI - PMC - PubMed
    1. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. - DOI - PMC - PubMed
    1. Baranova MA, Logacheva MD, Penin AA, Seplyarskiy VB, Safonova YY, Naumenko SA, Klepikova AV, Gerasimov ES, Bazykin GA, James TY, Kondrashov AS. Extraordinary Genetic Diversity in a Wood Decay Mushroom. Molecular Biology and Evolution. 2015;32:2775–2783. doi: 10.1093/molbev/msv153. - DOI - PMC - PubMed
    1. Barton NH. Genetic linkage and natural selection. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2010;365:2559–2569. doi: 10.1098/rstb.2010.0106. - DOI - PMC - PubMed

Publication types