Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 16;114(20):E3984-E3992.
doi: 10.1073/pnas.1704117114. Epub 2017 May 2.

Structural variants caused by Alu insertions are associated with risks for many human diseases

Affiliations

Structural variants caused by Alu insertions are associated with risks for many human diseases

Lindsay M Payer et al. Proc Natl Acad Sci U S A. .

Abstract

Interspersed repeat sequences comprise much of our DNA, although their functional effects are poorly understood. The most commonly occurring repeat is the Alu short interspersed element. New Alu insertions occur in human populations, and have been responsible for several instances of genetic disease. In this study, we sought to determine if there are instances of polymorphic Alu insertion variants that function in a common variant, common disease paradigm. We cataloged 809 polymorphic Alu elements mapping to 1,159 loci implicated in disease risk by genome-wide association study (GWAS) (P < 10-8). We found that Alu insertion variants occur disproportionately at GWAS loci (P = 0.013). Moreover, we identified 44 of these Alu elements in linkage disequilibrium (r2 > 0.7) with the trait-associated SNP. This figure represents a >20-fold increase in the number of polymorphic Alu elements associated with human phenotypes. This work provides a broader perspective on how structural variants in repetitive DNAs may contribute to human disease.

Keywords: Alu; GWAS; causative variant; interspersed repeats; structural variant.

PubMed Disclaimer

Conflict of interest statement

J.H. was a PhD trainee of J.D.B. (graduation date 2005). J.H. was a middle author on a single paper from J.D.B. in the past 4 years, on an unrelated project; its publication was delayed until 2014. All other authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Polymorphic Alu elements map to GWAS loci. GWAS signals (P ≤ 10−9) with at least one polymorphic Alu element mapping to the TAS-defined LD block are represented by a circle, with color based on the phenotype reported in the GWAS. Figure modified from ref. . Elements mapping to sex chromosomes and the HLA locus (*) were excluded from the diagram. Details of GWAS loci and polymorphic Alu elements are provided in Dataset S3.
Fig. 2.
Fig. 2.
Alu variants are enriched at GWAS signals. (A) Alu variants map to TAS LD blocks 625 times (red). To determine if this number is attributable to chance, 1,000 iterations of random LD blocks mirroring TAS LD blocks were generated. The distribution of the number of times Alu variants overlap with these random blocks is shown (black). (B) Random LD blocks (gray) closely mirror TAS LD blocks (red) in size.
Fig. S1.
Fig. S1.
Alu allele frequencies. (A) Distribution of Alu insertion allele frequencies for 173 genotyped variants. Many are <0.1, but many are more common alleles. (B) Overall distribution of MAFs for TASs and Alu variants. Alu variants in LD with TASs have similar MAF distribution to TASs. *P = 0.014. (C) Pairwise relationship between MAF of the Alu variant and the corresponding TAS. The degree of LD between each pair is shown by shape and color.
Fig. 3.
Fig. 3.
Genetic relationships between TASs and polymorphic Alu elements distinguish functional variant candidates. (A) There is no LD between the polymorphic Alu element and the lung cancer TAS, rs9387478 (P = 10−10) at 6p22.1. The Alu element is just as frequently found with the risk haplotype (*) as with the protective haplotype. (B) Moderate LD between a polymorphic Alu element and a TAS associated with urate levels (P = 3 × 10−9) makes this variant a potential functional variant. Although the MAF differs between the Alu variant and the TAS, when present, Alu is consistently on the major haplotype strand. (C) Good functional variant candidate. There are many SINEs across the locus, but only one polymorphic Alu (red) has been identified at this locus; the polymorphic Alu element occurs in ACE. The LD structure is shown and generated by pairwise comparison between variants (SNPs and Alu element), where red indicates LD. The GWAS LD block associated with the TASs (blue) and Alu variant (red) is bracketed and shown by a red horizontal line. There is complete LD (r2 = 1) between the Alu element and rs4343, the SNP associated with human serum ACE levels in a GWAS (P = 3 × 10−25). The empty allele is on the risk (*) haplotype.
Fig. S2.
Fig. S2.
Alu variant in moderate LD with the TAS but not highly associated with the disease. (A) The 1q31.3 locus is associated with age-related macular degeneration (P = 9 × 10−24) (30). The genomic locus is shown with the locations of the polymorphic Alu element (red) and TAS (blue). Although allele frequencies differ between the TAS and Alu variant, when present, Alu is usually on the risk (*) haplotype. (B) Individual-level genotype data from an age-related macular degeneration GWAS were used to impute (infer) Alu genotype in this population. Association results based on these imputed genotypes are shown with genomic coordinates plotted on the x axis and disease association on the y axis. Alu (red diamond) is not as associated with disease as the TAS (blue triangle), and is not a good candidate causative variant.
Fig. 4.
Fig. 4.
Forty-four Alu variants are in strong LD with TAS(s). LD results for all pairwise comparisons between polymorphic Alu elements and their corresponding TASs mapped across the genome are shown. Strong LD (r2 > 0.7) indicates the best functional candidates, and falls above the red line. There are 44 of these insertions, also shown in Table 1. We also defined a subset of polymorphic Alu elements imperfectly correlated with nearby TASs (0.4< r2 < 0.7; n = 16) (above the gray line).
Fig. S3.
Fig. S3.
Alu subfamilies associated with disease risk are representative of all Alu insertion variants. Subfamily representation of polymorphic Alu elements that are in LD with TASs mirrors the subfamily distribution of all polymorphic Alu elements (36).
Fig. 5.
Fig. 5.
Loci where polymorphic Alu elements are potential causative variants. LD plots show Alu insertions and neighboring SNPs with pairwise comparisons indicating variants in LD (red). (A) The 2p25.3 locus is associated with obesity (best P = 3 × 10−49) (37). The Alu insertion variant (red) and TASs (blue) are annotated within the LD block (red horizontal line) downstream of the TMEM18 gene. The r2 values between the Alu and TASs are shown to the lower left; phased haplotypes are shown to the lower right. Here, the preinsertion (empty) allele is the risk allele (*), and the Alu insertion segregates with the protective haplotype. (B) The 1q31.3 locus associated with meningococcal disease (P = 5 × 10−13) (47); the Alu is on the risk haplotype at CFHR3. (C) The 10q21.2 locus for precursor B-cell ALL; the Alu is on the risk haplotype. (D) The 8q24 locus for prostate cancer; the Alu is on the risk haplotype. For C and D, we imputed the Alu variant genotype for patients in the GWAS and controls to test the association between Alu genotype and disease. Graphs show the −log of the P value for disease association on the y axis; the genomic coordinate is plotted on the x axis. The polymorphic Alu in each case is highly associated with the disease (red diamond), comparable to proximal TASs (blue triangles).
Fig. S4.
Fig. S4.
Imputing across the 8q24 locus. As in Fig. 5D, the Alu genotype was imputed into the prostate cancer GWAS dataset. An association plot across a broader region is shown. Four independent prostate GWAS signals have been reported in this region (blue bars). Two signals have been reported for other epithelial cancers (black bars): Signal 2 is associated with breast cancer, and signal 3 is associated with ovarian and colorectal cancer. The polymorphic Alu element and associated TASs map to the first peak in this region (1) and are highlighted in Fig. 5D.

References

    1. Lander ES, et al. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. Kellis M, et al. Defining functional DNA elements in the human genome. Proc Natl Acad Sci USA. 2014;111:6131–6138. - PMC - PubMed
    1. Smit AFA, Hubley R, Green P. 2015 RepeatMasker Open-4.0. Available at www.repeatmasker.org. Accessed April 26, 2017.
    1. Kazazian HH, Jr, et al. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature. 1988;332:164–166. - PubMed
    1. Sukarova E, Dimovski AJ, Tchacarova P, Petkov GH, Efremov GD. An Alu insert as the cause of a severe form of hemophilia A. Acta Haematol. 2001;106:126–129. - PubMed

Publication types