Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 27:16:25.
doi: 10.1186/s12862-016-0590-7.

Genomic variations and distinct evolutionary rate of rare alleles in Arabidopsis thaliana

Affiliations

Genomic variations and distinct evolutionary rate of rare alleles in Arabidopsis thaliana

Shabana Memon et al. BMC Evol Biol. .

Abstract

Background: The variation rate in genomic regions associated with different alleles, impacts to distinct evolutionary patterns involving rare alleles. The rare alleles bias towards genome-wide association studies (GWASs), aim to detect different variants at genomic loci associated with single-nucleotide polymorphisms (SNPs) inclined to produce different haplotypes. Here, we sequenced Arabidopsis thaliana and compared its coding and non-coding genomic regions with its closest outgroup relative, Arabidopsis lyrta, which accounted for the ancestral misinference. The use of genome-wide SNPs interpret the genetic architecture of rare alleles in Arabidopsis thaliana, elucidating a significant departure from a neutral evolutionary model and the pattern of polymorphisms around a selected locus will exclusively influence natural selection.

Results: We found 23.4% of the rare alleles existing randomly in the genome. Notably, in our results significant differences (P < 0.01) were estimated in the relative rates between rare versus intermediate alleles, between fixed versus non-fixed mutations, and between type I versus type II rare-mutations by using the χ (2)-test. However, the rare alleles generating negative values of Tajima's D suggest that they generated under selective sweeps. Relative to polymorphic sites including SNPs, 67.5% of the fixed mutations were attributed, indicating major contributors to speciation. Substantially, an evolution occurred in the rare allele that was 1.42-times faster than that in a major haplotype.

Conclusion: Our results interpret that rare alleles fits a random occurrence model, indicating that rare alleles occur at any locus in a genome and in any accession in a species. Based on the higher relative rate of derived to ancient mutations and higher average D xy, we conclude that rare alleles evolve faster than the higher frequency alleles. The rapid evolution of rare alleles indicates that they must have been newly generated with fixed mutations, compared with the other alleles. Eventually, PCR and sequencing results, in the flanking regions of rare allele loci confirm that they are of short extension, indicating the absence of a genome-wide pattern for a rare haplotype. The indel-associated model for rare alleles assumes that indel-associated mutations only occur in an indel heterozygote.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Examples of rare allele, intermediate allele in a gSNP locus and complicated and dSNP locus. Dot indicates identical nucleotide compared to ortholog (A. lyrata sequence relative to the alignment of A. thaliana). The numbers above the ortholog sequence denote nucleotide positions in the sequence. Fixed sites between haplotypes at rare or intermediate alleles are shown in grey shadow. a An example of a rare haplotype in gSNP locus is located at 18433043 (chr. 1). b An intermediate allele is shown in gSNP locus which is at 24321799 (chr. 1). c An example used as a complex locus that holds two distinct haplotypes of both a rare allele (light grey shadow) and an intermediate allele (dark grey shadow) is located at 4826763 (chr. 1). d An example of a dSNP locus is at 10095832 (chr. 1). Only part of the accessions within each haplotype is displayed in all examples
Fig. 2
Fig. 2
Frequency distribution of rare alleles and nucleotide substitutions. a Observed frequency for the loci with different types of rare and intermediate alleles separately. b Distribution of rare alleles in 96 accessions at 939 loci. The x-axis represents the total number of occurrence of rare alleles per accession and the y-axis denotes the total number of accessions which contain the same number of occurrence. The expected number is estimated under the random occurrence model with all the accessions. c- d The relationship between relative rates of derived to ancient mutations and frequency of 5625 gSNP sites, the fixed mutations (c), and 8234 rSNPs sites (including nfSNP and dSNP) (d)
Fig. 3
Fig. 3
The allelic frequency distribution of SNPs occurring in all regions, coding and non-coding regions with A. lyrata as an outgroup sequence in all of the 939 loci: (a) all SNP; (b) gSNP sites; (c) rSNPs sites (including nfSNPs and dSNPs); (d) The number distribution of gSNPs with the fitting formula y = 1315.0 – 4535.6x + 4230.7x2 and dispersed SNPs with the fitting formula y = 1012.8-3221.7x + 2695.4x2. The expected allele frequency distribution with an outgroup sequence under a standard constant-size population genetics model in a, b and c is given by (1/i)/∑n-1j = 1 1/j, for i mutants per site, where n is the sample size (Ewens [43])
Fig. 4
Fig. 4
Characteristics of gSNP, nfSNP and dSNP in the coding region: (a) Comparison of the distribution of non-synonymous and synonymous SNPs in derived mutations (allele frequency <0.5) of gSNP, nfSNP, and dSNP sites, respectively. b Distribution of SNPs occurred at the first, second and third position of a codon in derived mutations of gSNP, nfSNP and dSNP sites, respectively. c Comparison of non-synonymous and synonymous SNPs distribution in ancient mutations (allele frequency ≥0.5) of gSNP, nfSNP and dSNP sites, respectively. d Distribution of SNPs occurred at the first, second and third position of a codon in ancient mutations of GNP, nfSNP and dSNP sites, respectively, in which the total frequency is 1 for each category of SNPs in a-d

Similar articles

Cited by

References

    1. Borevitz JO, Hazen SP, Michael TP, Morris GP, Baxter IR, Hu TT, et al. Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana. Proc Natl Acad Sci. 2007;104:12057–12062. doi: 10.1073/pnas.0705323104. - DOI - PMC - PubMed
    1. Gregory TR. Insertion–deletion biases and the evolution of genome size. Gene. 2004;324:15–34. doi: 10.1016/j.gene.2003.09.030. - DOI - PubMed
    1. Zhu C, Li X, Yu J. Integrating Rare-Variant Testing, Function Prediction, and Gene Network in Composite Resequencing-Based Genome-Wide Association Studies (CR-GWAS) G3: Genes, Genomes, Genetics. 2011;1:233–243. doi: 10.1534/g3.111.000364. - DOI - PMC - PubMed
    1. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90:7–24. doi: 10.1016/j.ajhg.2011.11.029. - DOI - PMC - PubMed
    1. Raychaudhuri S. Mapping rare and common causal alleles for complex human diseases. Cell. 2011;147:57–69. doi: 10.1016/j.cell.2011.09.011. - DOI - PMC - PubMed

Publication types

LinkOut - more resources