Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 21;17(1):26.
doi: 10.1186/s13073-025-01448-2.

Long-read sequencing identifies copy-specific markers of SMN gene conversion in spinal muscular atrophy

Affiliations

Long-read sequencing identifies copy-specific markers of SMN gene conversion in spinal muscular atrophy

M M Zwartkruis et al. Genome Med. .

Abstract

Background: The complex 2 Mb survival motor neuron (SMN) locus on chromosome 5q13, including the spinal muscular atrophy (SMA)-causing gene SMN1 and modifier SMN2, remains incompletely resolved due to numerous segmental duplications. Variation in SMN2 copy number, presumably influenced by SMN1 to SMN2 gene conversion, affects disease severity, though SMN2 copy number alone has insufficient prognostic value due to limited genotype-phenotype correlations. With advancements in newborn screening and SMN-targeted therapies, identifying genetic markers to predict disease progression and treatment response is crucial. Progress has thus far been limited by methodological constraints.

Methods: To address this, we developed HapSMA, a method to perform polyploid phasing of the SMN locus to enable copy-specific analysis of SMN and its surrounding genes. We used HapSMA on publicly available Oxford Nanopore Technologies (ONT) sequencing data of 29 healthy controls and performed long-read, targeted ONT sequencing of the SMN locus of 31 patients with SMA.

Results: In healthy controls, we identified single nucleotide variants (SNVs) specific to SMN1 and SMN2 haplotypes that could serve as gene conversion markers. Broad phasing including the NAIP gene allowed for a more complete view of SMN locus variation. Genetic variation in SMN2 haplotypes was larger in SMA patients. Forty-two percent of SMN2 haplotypes of SMA patients showed varying SMN1 to SMN2 gene conversion breakpoints, serving as direct evidence of gene conversion as a common genetic characteristic in SMA and highlighting the importance of inclusion of SMA patients when investigating the SMN locus.

Conclusions: Our findings illustrate that both methodological advances and the analysis of patient samples are required to advance our understanding of complex genetic loci and address critical clinical challenges.

Keywords: Dark genomic regions; Gene conversion; Long-read sequencing; Segmental duplications; Spinal muscular atrophy.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The study protocol (09307/NL29692.041.09) was approved by the Medical Ethical Committee of the University Medical Center Utrecht and registered at the Dutch registry for clinical studies and trials [24]. Written informed consent was obtained from all adult patients, and from patients and/or parents additionally in the case of children younger than 18 years old. Consent for publication: Not applicable. Competing interests: JHV reports to have sponsored research agreements with Biogen and Astra Zeneca. The remaining authors declare that they do not have any competing interests.

Figures

Fig. 1
Fig. 1
Copy-specific analysis of SMN and surrounding genes by targeted mapping and polyploid phasing. A Dot plot resulting from segmental duplication analysis of T2T-CHM13 chromosome 5 against itself (left panel), zoomed in on the SMN locus (right panel). Red lines indicate segments of at least 95% similarity and at least 10 kb. The structure of the SMN locus as shown in B is shown at scale on both the x- and y-axes. B Structure of the SMN locus on the T2T-CHM13 reference genome. Genes are indicated by colored arrows. Genome coordinates chr5:70,772,138–70,944,284 were masked, and its segmental duplication counterpart chr5:71,274,893–71,447,410 was the main region of interest (ROI) for haplotype phasing. C Overview of sequencing and bioinformatics approaches in this study. ONT sequencing with adaptive sampling was performed on the HMW DNA of SMA patients. Raw sequencing data was basecalled with the Guppy SUP model and mapped to GRCh38. Within the HapSMA workflow, reads were remapped to the masked T2T-CHM13 reference genome as indicated in B. Polyploid variant calling and haplotype phasing were performed, followed by variant calling per haplotype. Figure created with BioRender.com. D Example of haplotype phasing of an SMA sample with three SMN2 copies across SMN1/2 and surrounding genes. Sequencing reads are colored by haplotype. Soft-clipping is not shown. E Comparison of SNVs called by Paraphase and HapSMA. Panel a: SNVs called by Paraphase but not by HapSMA (in at least one sample). Panel b: SNVs called by both Paraphase and HapSMA (in at least one sample). Panel c: SNVs called by HapSMA but not by Paraphase (in at least one sample). With HapSMA, SNVs are called in a larger region (~ 500 kb) than with Paraphase (~ 44 kb). Panel d: Phasing ROI as shown in B. Panel e: gene annotation. HMW, high molecular weight; HPRC, Human Pangenome Reference Consortium; kb, kilobases; Mb, megabases; ROI, region of interest
Fig. 2
Fig. 2
In healthy controls, specific SNVs and NAIP variants characterize the downstream environment of SMN1/2. A IGV overview of SMN1 and SMN2 haplotypes (divided based on PSV13 (c.840C > T) in exon 7 (see the “Methods” section)) from HPRC healthy control samples, mapped to the masked T2T-CHM13 reference genome. Each “read” represents one haplotype from one sample. SMN2-specific variant positions (present in ≥ 90% of SMN2 haplotypes and ≤ 10% of SMN1 haplotypes) and SMN1-specific variant positions (present in ≥ 90% of SMN1 haplotypes and ≤ 10% of SMN2 haplotypes), are indicated by blue lines above the genes, including PSVs and downstream SMN1/2 environment SNVs. *SMN1 haplotypes with downstream SMN2 environment SNVs. B Schematic representation of PSVs, SMN1/2 environment SNVs, and presence of the (pseudo)NAIP gene per haplotype. Only haplotypes with complete phasing between PSV13 (c.840C > T) and (pseudo)NAIP are shown. In the right panel, downstream haplotype frequencies are shown schematically. Downstream environment other than the “expected” environment was called when 3 or more consecutive SMN1/2 environment SNVs were present. Full-length NAIP was characterized as SMN1 environment, whereas truncated NAIPPΔ1–5, NAIPPΔ1–9 or NAIPPΔ4–5 was characterized as SMN2 environment [20]. *PSV8 (5 bp insertion at position chr5:71,407,825) is currently not considered a PSV, but a common variant [4]. IGV, Integrative Genomics Viewer; NAIPP, pseudoNAIP; PSV, paralogous sequence variant; SNV, single nucleotide variant
Fig. 3
Fig. 3
Markers of the SMN1 environment are abundant and highly variable in SMN2 haplotypes of SMA patients. A IGV overview of SMN1 and SMN2 haplotypes (divided based on PSV13 (c.840C > T) in exon 7 (see the “Methods” section)) from SMA patients, mapped to the masked T2T-CHM13 reference genome. Each “read” represents one haplotype from one sample. SMN2-specific variant positions (present in ≥ 90% of SMN2 haplotypes and ≤ 10% of SMN1 haplotypes) and SMN1-specific variant positions (present in ≥ 90% of SMN1 haplotypes and ≤ 10% of SMN2 haplotypes) as determined in Fig. 2A, are indicated by blue lines above the genes, including PSVs and downstream SMN1/2 environment SNVs. *SMN2 haplotypes with downstream SMN1 environment SNVs. **SMN2 haplotypes with an incompletely resolved downstream SMN1/2 environment. B Schematic representation of PSVs, SMN1/2 environment SNVs, and presence of the (pseudo)NAIP gene per haplotype. Of non-hybrid SMN2 haplotypes, only haplotypes with complete phasing between PSV13 (c.840C > T) and (pseudo)NAIP are shown. In the two lower left panels, haplotypes with hybrid SMN2 genes are shown, of which five hybrid structures are novel: PSV1–4, 6–9, and 16 (a); PSV1–7 and 14–16 (b); PSV1–11 (c); PSV1 (d); PSV1–9 (e). In the right panel, downstream haplotype frequencies are shown schematically. Downstream environment other than the “expected” environment was called when 3 or more consecutive SMN1/2 environment SNVs were present. Full-length NAIP was characterized as SMN1 environment, whereas truncated NAIPPΔ1–5, NAIPPΔ1–9, or NAIPPΔ4–5 was characterized as SMN2 environment [20]. The percentage of downstream SMN1 environment in hybrid haplotypes (53.8%) was not significantly higher than in non-hybrid SMN2 haplotypes (38.5%; Fisher’s exact test, p = 0.353). *PSV8 (5 bp insertion at position chr5:71,407,825) is currently not considered a PSV, but a common variant [4]. IGV, Intergrative Genomics Viewer; NAIPP, pseudoNAIP; PSV, paralogous sequence variant; SNV, single nucleotide variant

References

    1. Schmutz J, Martin J, Terry A, Couronne O, Grimwood J, Lowry S, et al. The DNA sequence and comparative analysis of human chromosome 5. Nature. 2004;431:268–74. - PubMed
    1. Lefebvre S, Bürglen L, Reboullet S, Clermont O, Burlet P, Viollet L, et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell. 1995;80:155–65. - PubMed
    1. Blasco-Pérez L, Paramonov I, Leno J, Bernal S, Alias L, Fuentes-Prior P, et al. Beyond copy number: A new, rapid, and versatile method for sequencing the entire SMN2 gene in SMA patients. Hum Mutat. 2021;42:787–95. - PMC - PubMed
    1. Costa-Roger M, Blasco-Pérez L, Gerin L, Codina-Solà M, Leno-Colorado J, Gómez-García De la Banda M, et al. Complex SMN Hybrids Detected in a Cohort of 31 Patients With Spinal Muscular Atrophy. Neurol Genet. 2024;10:e200175. - PMC - PubMed
    1. Monani UR, Lorson CL, Parsons DW, Prior TW, Androphy EJ, Burghes AHM, et al. A Single Nucleotide Difference That Alters Splicing Patterns Distinguishes the SMA Gene SMN1 From the Copy Gene SMN2. Hum Mol Genet. 1999;8:1177–83. - PubMed

LinkOut - more resources