Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 14;187(23):6687-6706.e25.
doi: 10.1016/j.cell.2024.09.014. Epub 2024 Sep 30.

Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders

Affiliations

Detection and analysis of complex structural variation in human genomes across populations and in brains of donors with psychiatric disorders

Bo Zhou et al. Cell. .

Abstract

Complex structural variations (cxSVs) are often overlooked in genome analyses due to detection challenges. We developed ARC-SV, a probabilistic and machine-learning-based method that enables accurate detection and reconstruction of cxSVs from standard datasets. By applying ARC-SV across 4,262 genomes representing all continental populations, we identified cxSVs as a significant source of natural human genetic variation. Rare cxSVs have a propensity to occur in neural genes and loci that underwent rapid human-specific evolution, including those regulating corticogenesis. By performing single-nucleus multiomics in postmortem brains, we discovered cxSVs associated with differential gene expression and chromatin accessibility across various brain regions and cell types. Additionally, cxSVs detected in brains of psychiatric cases are enriched for linkage with psychiatric GWAS risk alleles detected in the same brains. Furthermore, our analysis revealed significantly decreased brain-region- and cell-type-specific expression of cxSV genes, specifically for psychiatric cases, implicating cxSVs in the molecular etiology of major neuropsychiatric disorders.

Keywords: ARC-SV; GTEx; PsychENCODE; complex structural variation; cxSVs; human evolution; population genetics; psychiatric genetics; single-cell multiomics; structural variation.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests M.P.S. is a co-founder and on the advisory boards of Personalis, Qbio, January AI, SensOmics, Filtricine, Protos, Mirvie, Onza, Marble Therapeutics, Iollo, and NextThought AI. He is also on the advisory boards of Jupiter, Applied Cognition, Neuvivo, Mitrix, and Enovone. W.J.G. is a consultant for 10× Genomics, Guardant Health, Quantapore, and Ultima Genomics, a co-founder of Protillion Biosciences, and is named on ATAC-seq patents. A.K. is a consulting fellow with Illumina; a member of the SABs of OpenTargets (GSK), PatchBio, and SerImmune; and a co-founder of RavelBio. S.B.M. is an advisor for BioMarin, MyOme, and Tenaya Therapeutics.

Figures

Figure 1.
Figure 1.. Overview of ARC–SV.
(A) Candidate breakpoints are identified from discordant read pairs (1, 2) and split reads (3) and soft-clipped alignments. Novel adjacencies are proposed, such as CB (blue) from read pair 1 and BE (purple) from read pairs 2 and 3. ARC-SV scores diploid genotypes using a likelihood model, with rearrangement ABCBE explaining the discordant and split reads. (B) ARC-SV calls from 31 individuals were assembled and compared against their Human Pangenome diploid assemblies (“ground truth”). (C) Calls were labeled as “validated” or “not validated” and used to train machine learning models for high-confidence SV detection in unseen genomes. (D) Examples of cxSVs (not exhaustive) called by ARC-SV and validated in pangenomes. (E) Confusion matrices and F1 scores for the optimal machine learning model, validated against T2T and other genome assemblies,. (F) Comparison of validated and non-validated cxSV calls in HG00733 between SVision (using PacBio HiFi long-read WGS) and ARC-SV (using short-read WGS). (G) For somatic cxSV detection, ARC-SV was applied to WGS of clonal expansions from single NPCs isolated from fetal brain regions. The three cxSVs detected were exactly the ones experimentally validated in Sekar et al.
Figure 2.
Figure 2.. Examples of cxSVs validated in Human Pangenome assemblies.
(A–D) Plots mapping of unique k-mers (31 bp) in hg38 (x-axis) and the respective pangenome assemblies (y-axis) at loci with cxSVs detected by ARC-SV. ENCODE candidate cis-regulatory elements with distal enhancer-like signatures are indicated by “enhD.” GeneHancer annotations (GH). Bar plots: superpopulation allele frequencies (%).
Figure 3.
Figure 3.. Characteristics of cxSVs across human populations.
(A) A total of 4,262 human genomes, representing world populations, and 287 from GTEx are included in this study. (B) Size (kb) distributions of 12 cxSV subclasses. (C) Total number of unique cxSVs detected by class from the 4,262 genomes. (D) Population pairwise FST values for cxSVs and simple SVs. (E) Boxplot of the number of cxSVs per genome by superpopulation. (F) Heatmap of common cxSVs (hierarchical clustering) across superpopulations, with allele frequency z-scores. (G) Scatterplot of the top two principal components for common cxSVs.
Figure 4.
Figure 4.. Properties of cxSVs across the human genome.
(A) Yellow bars mark cxSV locations along the chromosome, with high-density regions in red. shows fold enrichment of cxSVs, simple SVs, short indels (<50 bp), and PTVs in DNA double-strand break and denovo mutation (DNM) hotspots. P-values and significance (stars) are indicated. Enrichment of common (blue) and rare (yellow) cxSVs in (B) HARs, HAQERs, (D) “human-gained” enhancers/promoters, epigenetic gain-enriched hotspots, (E) protein-coding genes within topological domains of these hotspots, and (F,G) GO Cellular Component terms (q-values). (C) Example of a rare cxSV affecting HARs that are enhancers,. (H) Association z-scores for bipolar disorder and schizophrenia GWAS SNP risk alleles, with cxSVs versus controls (p=6.31e-23). Simple SV analysis shown below (p=6.23e-5). All SVs were detected from WGS of brain samples. (I) Manhattan plot of combined GWAS loci for bipolar disorder and schizophrenia,, highlighting significant peaks co-localizing with cxSVs. Example cxSV within STAG1 linked to GWAS risk alleles rs6764567 and rs4038578, where they are 61 bp and 1,369 bp downstream, respectively, of the distal end of the rearranged “D” block.
Figure 5.
Figure 5.. Integrative analysis of PsychENCODE brain cxSVs with sn-multiomics (snRNA-seq and snATAC-seq from the same nucleus).
WNN UMAP grouped by (A) phenotype, (B) neurons versus glia, and (C) 20 cell types. (D–F) Differential cxSV-gene expression (AC008014.1) between cases and controls in neurons across three brain regions from 40 HBCC brains based on pseudobulk snRNA-seq. (G,H) Significant differential chromatin openness for cxSV-associated snATAC-seq peaks within EXT1 in DLPFC neurons and glia. (I) Illustration of sn-multiome analysis (snRNA-seq and snATAC-seq) performed for DLPFC, dACC, and hippocampus in each HBCC brain. Gray circles indicate GTEx brain regions used for cxSV-eQTL mapping.
Figure 6.
Figure 6.. Global association of cxSVs with functional genomic changes in neurons and glia across brain regions in schizophrenia or bipolar disorder patients versus controls.
(A) Z-score distributions (one-sided Wilcoxon rank-sum test) of gene expression (snRNA-seq) in hippocampal, dACC, and DLPFC neurons and glia from 40 HBCC brains for cxSV case-carriers (schizophrenia or bipolar disorder) versus cxSV control-carriers.(B) Left: Same analysis with an additional 79 CMC brains (DLPFC snRNA-seq). Right: Z-score distributions of chromatin openness (snATAC-seq) in hippocampal neurons and dACC glia from 40 HBCC brains for cxSV case-carriers versus controls. (C) Same analysis as in (A) but for simple SVs. (D) Z-score distributions of gene expression in hippocampal, dACC, and DLPFC neurons and glia for cxSV versus simple SV case-carriers in 40 HBCC brains.
Figure 7.
Figure 7.. Examples of cxSVs at loci that diverged with other primates and of those shared with archaic humans.
Fixed loci in humans that diverged with bonobos and chimpanzees via (A) a cxSV on chr5:29094936–29109104 (hg38) within LINC02109 and affecting HAR ANC302 and (B) a cxSV (chr1:90794594–90794971, hg38) within LINC02609 containing a human-gained brain enhancer (chr1:90794843–90796293, hg38) and GWAS SNP rs4561025. (C) A fixed locus in humans that diverged with bonobos, chimpanzees, and gorillas via a cxSV (chr11:92193737–92196296, hg38), affecting HAQER2045 and containing GWAS SNP rs1125472. (D,E) cxSVs within PRMT2 and SAMMSON identified in the Neanderthal genome, shared with modern humans. Bar plot: superpopulations allele frequencies (y-axis, %). Proximal enhancer-like element (E2148159) indicated as enhP.

References

    1. Pang AW, MacDonald JR, Pinto D, Wei J, Rafiq MA, Conrad DF, Park H, Hurles ME, Lee C, Venter JC, et al. (2010). Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52. 10.1186/gb-2010-11-5-r52. - DOI - PMC - PubMed
    1. Stankiewicz P, and Lupski JR (2010). Structural variation in the human genome and its role in disease. Annu. Rev. Med 61, 437–455. 10.1146/annurev-med-100708-204735. - DOI - PubMed
    1. Weischenfeldt J, Symmons O, Spitz F, and Korbel JO (2013). Phenotypic impact of genomic structural variation: insights from and for human disease. Nat. Rev. Genet 14, 125–138. 10.1038/nrg3373. - DOI - PubMed
    1. Chiang C, Scott AJ, Davis JR, Tsang EK, Li X, Kim Y, Hadzic T, Damani FN, Ganel L, GTEx Consortium, et al. (2017). The impact of structural variation on human gene expression. Nat. Genet 49, 692–699. 10.1038/ng.3834. - DOI - PMC - PubMed
    1. Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S, Breeling J, Whitbourne S, Deen J, Shannon C, Humphries D, et al. (2016). Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol 70, 214–223. 10.1016/j.jclinepi.2015.09.016. - DOI - PubMed

LinkOut - more resources