Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jul 22:2025.07.21.25331932.
doi: 10.1101/2025.07.21.25331932.

Pangenome discovery of missing autism variants

Affiliations

Pangenome discovery of missing autism variants

Yang Sui et al. medRxiv. .

Abstract

Autism spectrum disorders (ASDs) are genetically and phenotypically heterogeneous and the majority of cases still remain genetically unresolved. To better understand large-effect pathogenic variation, we generated long-read sequencing data to construct phased and near-complete genome assemblies (average contig N50=43 Mbp, QV=56) for 189 individuals from 51 families with unsolved cases of autism. We applied read- and assembly-based strategies to facilitate comprehensive characterization of de novo mutations (DNMs), structural variants (SVs), and DNA methylation profiles. Merging common SVs obtained from long-read pangenome controls, we efficiently filtered >97% of common SVs exclusive to 87 offspring. We find no evidence of increased autosomal SV burden for probands when compared to unaffected siblings yet note a trend for an increase of SV burden on the X chromosome among affected females. We establish a workflow to prioritize potential pathogenic variants by integrating autism risk genes and putative noncoding regulatory elements defined from ATAC-seq and CUT&Tag data from the developing cortex. In total, we identified three pathogenic variants in TBL1XR1, MECP2, and SYNGAP1, as well as nine candidate de novo and biparental homozygous SVs, most of which were missed by short-read sequencing. Our work highlights the potential of phased genomes to discover complex more pathogenic mutations and the power of the pangenome to restrict the focus on an increasingly smaller number of SVs for clinical evaluation.

Keywords: autism; long-read sequencing; pangenomes; pathogenic variants; rare structural variants.

PubMed Disclaimer

Conflict of interest statement

E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. D.E.M. is on SABs at Oxford Nanopore Technologies (ONT) and Basis Genetics, is engaged in research agreements with ONT and PacBio, has received research and travel support from ONT and PacBio, holds stock options in MyOme and Basis Genetics, and is a consultant for MyOme. J.A.G. has received travel support from ONT. H.Y.Z. is a member of the Regeneron Board of Directors and an advisory board member to The Column Group, Cajal Therapeutics (also co-founder), and Lyterian. All other authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Long-read analysis sequencing and assembly.
a) Schematic workflow of LRS data generation and SV discovery with pedigree structures of the 51 unsolved ASD families (F=female; M=male). LRS data (PacBio HiFi and ONT) and phased genomes were constructed using hifiasm; SVs were discovered by PAV and validated via Truvari with read-based callers, PBSV and Sniffles (analyses tools indicated in oval boxes). Validated SVs were filtered using a pangenome of 108 control genomes from the HPRC and HGSVC to define a rare SV callset private to the autism families (Dataset S1). b) HiFi reads N50 and genomic coverage per sample (members of the same family are color coded). c) Sequence accuracy (QV) and contig N50 length for each HiFi-phased genome assembly. Solid lines represent mean values, while dashed lines indicate median values.
Figure 2.
Figure 2.. SV discovery, filtering and burden in ASD families.
a) SV discovery in probands (dark) and unaffected sibling (light) before (top) and after (bottom) pangenome filtering for 51 families with idiopathic autism. Proband sex versus unaffected sibling shown in parenthesis after family IDs. b) High- (HC) and low-confidence (LC) SVs by genotype class for autosomes and sex chromosomes. Het: heterozygous SVs. Hom: homozygous SVs. HC: high-confidence SVs confirmed by Mendelian inheritance of parental SV calls. LC: low-confidence SVs that initially deviated from Mendelian inheritance patterns in the collapsed table but were subsequently curated through further evaluation. The histogram compares the c) autosomal private SV burden and d) X chromosome burden (females only) between probands (pink) and unaffected siblings (gray) for 51 probands (41 females and 10 males) and 36 unaffected siblings (15 females and 21 males). Different functional categories of SV classes are considered: protein-coding and UTR (Exon), intergenic (Inter), intronic (Intron), deletions (DEL), insertions (INS), paternally inherited (Pat), maternally inherited (Mat), those overlapping neurodevelopmental genes (NDD), brain-derived regulatory regions (brainREG) or regulatory regions more generally (REG). No significant differences (X2 test p-values exceeding 0.05) in the number of SVs between probands and siblings were observed across these categories, with a trend observed on the X chromosome for enrichment of SVs on affected females compared to unaffected sisters.
Figure 3.
Figure 3.. Sex chromosome assembly, transmission and X chromosome inactivation skewing.
a) Stacked barplot showing X chromosome assembly continuity and mappability relative to the T2T-CHM13v2.0 reference across haplotypes. Each horizontal line represents one haplotype. The assembled contigs in each haplotype traverse the 1 Mbp window of the reference (no more than 3) and have at least ≥95% sequence overlap. Colored segments indicate SDs (yellow), centromeres (red), and gaps (black) on the reference cytogenetic band. b) Continuity and mappability of Y chromosome assemblies relative to the T2T-CHM13v2.0 reference (Yq12 heterochromatic region was masked). c) Transmitted X assemblies from father to two daughters in 12832 family with sequence identity visualized using gradient colors. d) Transmitted Y assemblies from father to sons in 14317 family. Pseudoautosomal regions (PARs), centromeres (Cen) and satellites (Sat), and X-transposed region (XTR) annotations were derived from Rhie et al. (Rhie et al. 2023). e) Haplotype-resolved methylation at CpGIs on the X chromosome in nine female-female quads. Mean methylation levels were calculated for each haplotype across 889 CpGIs and their ±5 kbp flanking regions on the X chromosome for 18 female individuals. Red denotes the maternal haplotype, while blue represents the paternal haplotype.
Figure 4.
Figure 4.. Pathogenic and candidate variants missed by short-read WGS.
Long-read sequencing solved cases a) (12237_p1) involving a stop-gain de novo mutation in SYNGAP1 and b) (HYZ207_p1) involving a de novo deletion in the last exon of MECP2. c) A de novo candidate mutation in the promoter of DDX3X in 14133_p1. d) A de novo candidate mutation in the promoter of POGZ in 12456_p1. e) A 71 bp de novo tandem insertion in 11201_p1, predicted to interrupt the HNRNPK TF binding cluster in the intron of CNTN3. f) A 135 bp homozygous tandem repeat (TR) contraction in the 3’ UTR of CLN8 in 11616_p1, predicted to disrupt the transcription of CLN8.
Figure 5.
Figure 5.. Reduction of the rare SV pool with increasing control samples.
a) Cumulative discovery curve of SVs identified in different control cohorts of 108, 285 and 572 individuals, compared to 87 children (both affected and unaffected) from autism families. Control samples and discovery curves were computed for both African and non-African controls. b) The inclusion of additional population controls refined the rare SV candidate pool, reducing the number of rare SVs (black) from 663 to 202, thus reducing the number of SVs under consideration from 97% to 99% (red).

References

    1. Abyzov Alexej, Urban Alexander E., Snyder Michael, and Gerstein Mark. 2011. “CNVnator: An Approach to Discover, Genotype, and Characterize Typical and Atypical CNVs from Family and Population Genome Sequencing.” Genome Research 21 (6): 974–84. 10.1101/gr.114876.110. - DOI - PMC - PubMed
    1. Alonge Michael, Lebeigle Ludivine, Kirsche Melanie, et al. 2022. “Automated Assembly Scaffolding Using RagTag Elevates a New Tomato System for High-Throughput Genome Editing.” Genome Biology 23 (1): 258. 10.1186/s13059-022-02823-7. - DOI - PMC - PubMed
    1. Audano Peter A., Sulovari Arvis, Graves-Lindsay Tina A., et al. 2019. “Characterizing the Major Structural Variant Alleles of the Human Genome.” Cell 176 (3): 663–675.e19. 10.1016/j.cell.2018.12.019. - DOI - PMC - PubMed
    1. Birtele Marcella, Del Dosso Ashley, Xu Tiantian, et al. 2023. “Non-Synaptic Function of the Autism Spectrum Disorder-Associated Gene SYNGAP1 in Cortical Neurogenesis.” Nature Neuroscience 26 (12): 2090–103. 10.1038/s41593-023-01477-3. - DOI - PMC - PubMed
    1. Carvalho Claudia M. B., and Lupski James R.. 2016. “Mechanisms Underlying Structural Variant Formation in Genomic Disorders.” Nature Reviews Genetics 17 (4): 224–38. 10.1038/nrg.2015.25. - DOI - PMC - PubMed

Publication types

LinkOut - more resources