Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May;93(5):1012-1022.
doi: 10.1002/ana.26608. Epub 2023 Feb 3.

Genome-Wide Analysis of Structural Variants in Parkinson Disease

Affiliations

Genome-Wide Analysis of Structural Variants in Parkinson Disease

Kimberley J Billingsley et al. Ann Neurol. 2023 May.

Abstract

Objective: Identification of genetic risk factors for Parkinson disease (PD) has to date been primarily limited to the study of single nucleotide variants, which only represent a small fraction of the genetic variation in the human genome. Consequently, causal variants for most PD risk are not known. Here we focused on structural variants (SVs), which represent a major source of genetic variation in the human genome. We aimed to discover SVs associated with PD risk by performing the first large-scale characterization of SVs in PD.

Methods: We leveraged a recently developed computational pipeline to detect and genotype SVs from 7,772 Illumina short-read whole genome sequencing samples. Using this set of SV variants, we performed a genome-wide association study using 2,585 cases and 2,779 controls and identified SVs associated with PD risk. Furthermore, to validate the presence of these variants, we generated a subset of matched whole-genome long-read sequencing data.

Results: We genotyped and tested 3,154 common SVs, representing over 412 million nucleotides of previously uncatalogued genetic variation. Using long-read sequencing data, we validated the presence of three novel deletion SVs that are associated with risk of PD from our initial association analysis, including a 2 kb intronic deletion within the gene LRRN4.

Interpretation: We identified three SVs associated with genetic risk of PD. This study represents the most comprehensive assessment of the contribution of SVs to the genetic risk of PD to date. ANN NEUROL 2023;93:1012-1022.

PubMed Disclaimer

Conflict of interest statement

Potential Conflicts of Interest

D.V, K.L and M.A.N.’s participation in this project was part of a competitive contract awarded to Data Tecnica International LLC by the National Institutes of Health to support open science research. M.A.N. also currently serves on the scientific advisory board for Clover Therapeutics and is an advisor to Neuron23 Inc. BT currently serves on the Editorial board of Clinical Medicine, JNNP, NBA, is an Associate Editor for Brain and has a collaborative research agreement with Ionis Pharmaceuticals, Roche and Optimeos. FJS receives research support from Illumina, ONT and PacBio. AM works for ONT. M.E.T. receives research funding and/or reagents from Levo Therapeutics, Microsoft Inc., and Illumina Inc.

Figures

Figure 1:
Figure 1:. SV analysis workflow.
This figure describes the study design behind the analyses included in this report.
Figure 2:
Figure 2:. Properties of SVs detected in the average genome.
We analyzed a total of 7772 short-read genomes after quality control. The plots show the breakdown across SV class and size. a) Overall, on average each genome carried 5,626 SV, with a median of 1361 insertions, 2991 deletions, 1194 duplications, 115 complex SVs and 11 inversions. b) The majority of SVs were small with a medium size of 329 bp. Overall only a total of 8% of SV per genome were larger than 2.5kb and 1% of SVs per genome were >50kb.
Figure 3:
Figure 3:. Size and allele frequency distribution of “PASS” SVs in the short-read data.
a) The majority of SVs are small and rare. As previously reported in other large-scale short-read studies three peaks are observed at 300bp, 2kb and 6kb, representing Alu, SVA and LINE1 mobile element insertions respectively. b) Most SVs were singleton variants (46.87%) or rare (AF<1%) (46.69 %).
Figure 4:
Figure 4:. A 2kb deletion within intron 3 of LRRN4 is a strong candidate for causal variant at the chr20 rs77351827 locus
a) A samplot image showing the ~2kb deletion at chr20. Aligned regions are marked in orange and the gap represents the deletion coded in black. The height of the alignment is based on the size of its largest gap. The three sequence alignment tracks follow, each alignment file plotted as a separate track in the image. The coverage for the region is shown with the gray-filled background. The SV genotypes (homozygous deletion, heterozygous deletion, and homozygous reference allele/no deletion) that were predicted by GATK-SV from the short-read sequencing data are annotated on the left of the corresponding tracks. Each genotype was confirmed in-silico by the matched long-read sequencing data. b) A LDheatmap showing pairwise LD measurements measured in R2 between the 2kb PD_DEL_chr20_597 deletion and rs77351827. High R2 values are shown in red and low R2 values in blue. PD_DEL_chr20_597 is in high LD with the lead PD risk SNV of this locus rs77351827(r2=0.89, D’=0.95). c) Locuszoom plot of the association signal at the chr20 rs77351827 PD risk locus from the Nalls 2019 PD SNV meta-analysis. The gene LRRN4 lies directly under the risk signal and the schematic below shows the location of the deletion within intron 3 of the gene.
Figure 5:
Figure 5:. Comparison of SVs called with short-read and long-read in eight matched PPMI blood samples -
ONT long-read sequencing detects significantly more SV than the short-read sequencing on average per genome. a) On average 5,626 SVs were detected per short-read genome compared to 27,277 with long-read sequencing. Of the 5,626 SV discovered in the short-read sequencing data, 72% of the SV were confirmed in-silico with long-read sequencing. As expected, duplications drove the false positive rate. b) The majority of the SV in the genome cannot be detected with sequencing data alone. Of the 27,277 SVs detected with long-read sequencing, only 14% of the SVs were present in the short-read callset. Most of these false negative calls, i.e SVs that were detected by long-read sequences but not present in the short-read callset were insertions.

References

    1. Nalls MA et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019). - PMC - PubMed
    1. Pang AW et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010). - PMC - PubMed
    1. Han L et al. Functional annotation of rare structural variation in the human brain. Nat. Commun. 11, 2990 (2020). - PMC - PubMed
    1. Scott AJ, Chiang C & Hall IM Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res. (2021) doi:10.1101/gr.275488.121. - DOI - PMC - PubMed
    1. Kitada T et al. Mutations in the parkin gene cause autosomal recessive juvenile parkinsonism. Nature vol. 392 605–608 Preprint at 10.1038/33416 (1998). - DOI - PubMed

Publication types