Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Mar 19:2024.03.18.585480.
doi: 10.1101/2024.03.18.585480.

Transcriptional Determinism and Stochasticity Contribute to the Complexity of Autism Associated SHANK Family Genes

Affiliations

Transcriptional Determinism and Stochasticity Contribute to the Complexity of Autism Associated SHANK Family Genes

Xiaona Lu et al. bioRxiv. .

Update in

Abstract

Precision of transcription is critical because transcriptional dysregulation is disease causing. Traditional methods of transcriptional profiling are inadequate to elucidate the full spectrum of the transcriptome, particularly for longer and less abundant mRNAs. SHANK3 is one of the most common autism causative genes. Twenty-four Shank3 mutant animal lines have been developed for autism modeling. However, their preclinical validity has been questioned due to incomplete Shank3 transcript structure. We applied an integrative approach combining cDNA-capture and long-read sequencing to profile the SHANK3 transcriptome in human and mice. We unexpectedly discovered an extremely complex SHANK3 transcriptome. Specific SHANK3 transcripts were altered in Shank3 mutant mice and postmortem brains tissues from individuals with ASD. The enhanced SHANK3 transcriptome significantly improved the detection rate for potential deleterious variants from genomics studies of neuropsychiatric disorders. Our findings suggest the stochastic transcription of genome associated with SHANK family genes.

PubMed Disclaimer

Conflict of interest statement

Competing interests YHJ is a scientific co-founder of Couragene. Inc but this study is unrelated to his role. The project was supported initially by sponsored research project by Taysha Gene Therapies. Taysha Gene Therapies did not have any direct tole for the conceptualization, design, data collection, analysis, decision to publish, or preparation of the manuscript.

Figures

Fig. 1.
Fig. 1.. Genome wide transcript diversity and abundance in brains detected by SIS.
A. Experimental design of SIS and CIS of human and mouse tissues. B. Schematic of experimental procedure of RNA capture and long read-sequencing. C. Number of unique transcripts (transcript diversity) for individual genes (blue) and the number of sequences reads (abundance) (red) for an individual transcript detected in human cerebral cortex by SIS with projected chromosome coordinates and ideograms. D. Transcript diversity was significantly correlated with the sequence reads (abundance) of the transcripts. E. Number of transcripts per gene genome-wide from SIS in human and mouse brains. F. Number of unique transcripts (Trans_Div) and abundance (Gene_FL) for 213 ASD risk genes, shown an average of 56 transcripts per gene and a median of 35. G-H. Human SIS data showed heightened transcript diversity in genes associated with brain disorders, especially ASD and NDD, compared to other diseases. We observed a strong correlation between transcript diversity and abundance in all gene clusters except for those related to dementia/Alzheimer’s.
Fig. 2.
Fig. 2.. Novel Shank3 transcriptome in mouse striatum (ST) by CIS.
A. CIS revealed a refined Shank3 gene structure and splicing patterns in WT mouse striatum. The established Shank3 structure (NM_001034115, mm39) is expanded with newly detected exons shared between striatum and PFC, depicted in purple. Unique splicing events, represented by grey lines and thickness indicating read quantity, include novel striatum-specific exons in dark blue and alternative splices in light blue. Fusion transcript exons near Gm41381 and Acr, shown in green and orange, respectively, feature unique splicing with newly identified red exons (T1-T3) exclusive to Shank3. New exon U3 is shared between striatum and PFC. U4 is linked to Gm4138 and striatum specific. 21e is a new in-frame exon and 21c is a new exon harbor a stop codon. B. 142 unique transcripts started with the canonical exon 1 of annotated Shank3 (NM_001034115) in ST and terminated at different positions. Pink bar plots on the left are the abundance (log2 counts). Arrows describe the features of given transcripts. C. Example of transcripts with similar structures in panorama but different at the sequence level with predicted ORFs and ATG codons. The transcripts of PB.13560.548, PB.13560.628, and PB.13560.547 are similar but the predicted ORFs show different ATG codons and protein domains. D. Details of the split exon 1. There is a cryptic splicing of 127 bp (non-capitalized sequence in black) within the annotated exon 1 of transcript PB.106071.171 which resulted in a predicted upstream ATG codon and additional 134 amino acids. Other transcripts have transcriptional starting sites (TSS) in exon 1 but predicted ATG codon in exon 2. Variability in TSS and intron 1 retention, as seen in transcripts PB.13554.484, PB.13554.580, and PB.13554.668, leads to ORFs of 304 aa, 106 aa, and 1,290 aa, respectively. E. Validations new transcripts from paired mouse PFC and ST samples. Pair 1, novel exon U1; Pair 2, fusion transcript between Shank3 exon 21 and Acr exon2; Pair 3, splicing event between Shank3 exon 9 and exon19; Pair 4, splicing event between Shank3 exon 5 and exon 21; Pair 5, novel exon 9b of Shank3; Pair 6, Shank3 exon11 extension/intron11 retention. The red arrows are the novel products confirmed by Sanger sequencing. Other bands are products from known transcripts. F. Sanger sequencing confirmation of a fusion transcript between Shank3 exon21 and Acr exon2 in mouse brain (pair 2 of E) G. Fusion transcripts in other tissues. Forward and reverse primers were from exon 20 of Shank3 and exon 5 of mouse Acr respectively. lane1, liver in P21 mouse; lane 2, thymus in P21 mouse; lane 3, ovary in P21 mouse; lane 4, ovary in 3 months old mouse; lane 5, testis in P21 mouse; lane 6, testis in 3-month-old mouse. The red arrows are the novel products confirmed by Sanger sequencing as indicated. Other bands are known products. H. Sanger sequencing of Shank3 exon 11 extension/intron 11 retention in mouse brain (lane 6 of G).
Fig. 3.
Fig. 3.. Novel Shank3 transcriptome in mouse PFC by CIS and predicted domain structures of ORFs
A. New Shank3 transcript structure and conch plot of splicing events discovered in WT mouse PFC by CIS. Color code is the same as Fig. 2A. The novel exon 9a (chr15:89394416–89394465, mm39) is shared between PFC and ST. Other novel exons such as exon 12e (chr15:89414330–89414640, mm39) were unique to PFC. Novel exons 21a, 21b and 21c are predicted to result in an early stop codon and shorter ORFs (chr15: 89394416–89394465, chr15: 89408698–89408784, chr15: 89418571–89418609, mm39). B. Structure of 59 transcripts with different TSSs but terminating at annotated exon 22 of Shank3. Pink bar plot represents the abundance (log2 counts) of each transcript. C-D. The comparison of transcripts and predicted ORFs between mouse ST and PFC. E-F. The pattern of deduced TSS and predicted starting sites of the coding sequence (CDS) for all Shank3 transcripts including new 5′ and 3′ fusion transcripts from CIS in mouse ST (E) and PFC (F). Each filament represents an individual transcript in different classes of GM41381(U1-U2)-Shank3, Shank3-T1–3, Shank3, Shank3-Acr (first column), deduced TSS (middle column), and predicted starting sites of CDS (third column). G. A total of 125 unique ORFs are predicted from 142 transcripts starting with exon 1 in ST. The pattern of the combination of 6 protein domains is shown in the outermost ring of the windmill plot. The middle layer shows the abundance of each RNA transcript and the p value of its expression level compared to other transcripts. Only 4 ORFs of transcripts contained all 6 protein domains. H-K. Four windmill plots showing 270 predicted ORFs from all 345 transcripts detected in PFC classified by the combination of functional domains. L. Spiral plot showing an aggregated functional domain coverage of the transcripts captured by Shank1–3 joint probe panel by CIS of mouse PFC and ST. Each dot represents a unique transcript. Each color represents a unique combination of functional domain. The dots are ordered from the longest to the shortest transcript, while the colors are arranged from the SAM to UBL domain.
Fig. 4.
Fig. 4.. The summary and illustration of altered Shank3 transcripts in Shank3Δe4−9, Shank3Δe21 and Shank3Δe4−22 mutant mice from CIS
A. Current annotated mouse Shank3 and Acr (NM_013455, mm39) gene structure. The annotations of genetically targeted mutations in mice, rat, monkey, and dog are shown. (KO: exonic deletions; KI; knock-in mutation) B. The gene structure of Shank3Δe4−9 mutant mice in grey and representative mRNA transcripts from Shank3Δe4−9−/− mice are in pink. No transcript using first annotated exon 1 was detected. Instead, the first exon, presumably a cryptic TSS (arrow), was detected in intron 1. The exon 4–9 deleted transcript missed exon 11, 12, and 22 but with fusion between Shank3 and Acr. The transcripts starting at intron 16/exon 17 (arrows) as first exon were most abundant. Extensive fusion transcripts between Shank3 exon 21 and Acr exon 2 were observed. The last coding exon 22 was not detected in any transcripts. C. The gene structure of Shank3Δe21 mutant mice and Acr gene in grey and representative3 mRNA transcripts from of Shank3Δe21−/− mice in blue. The splicing between exon 4 of Shank3 and exons of Acr that resulted in fusion transcripts were observed. The transcripts starting at intron 16/exon 17 (arrows) as first exon and fusion between Shank3 and Acr were most common. The coding exon 22 were not detected in any transcript. D. The gene structure of Shank3Δe4−22 mutant mice and Acr gene in grey and representative mRNA transcripts in purple. The number of fusion transcripts between Shank3 and Acr is significantly increased in Shank3Δe4−22−/− mutant mice. E-F. Increased expression of Acr transcript in Shank3Δe4−22−/− mutant mouse by RT-qPCR. The expression of Acr gene was significantly increased in both striatum and hippocampus by >100 folds. G-J. Compensatory expression of the functional domains of SHANK family proteins in striatum of Shank3Δe4−22 mutant mice. The bulk RNA-seq data of Shank3Δe4−22 were analyzed for the compensatory expression of other functional domains of Shank1 and Shank2 genes. The deficiency of ANKRY and SH3 domains of SHANK3 was compensated by SHANK1 but the deficiency of PDZ and SAM domains were compensated by both SHANK1 and SHANK2. The deficiency of SAM and SH3 domain was fully compensated but the deficiency of ANKRY and PDZ domains was partially compensated.
Fig. 5.
Fig. 5.. The novel transcripts of human SHANK3 genes detected by CIS and predicted ORFs
A. New SHANK3 transcript structure and Conch plot of SHANK3 transcripts discovered by CIS in normal human cortex. Black backbone is the annotated SHANK3 transcript of NM_001372044 (hg38). Blue rectangles represent novel exons of SHANK3. The exons of ACR are in orange rectangles. The new and uncharacterized exons distal to ACR are in red rectangles. The grey line connects adjacent exons while the light blue line illustrates alternative splicing events. The number of sequences reads for the splicing event is shown in the middle of connecting lines and reflected in the thickness of the connecting lines. B. Zoomed view of the splicing events between exons 10 and 20 in the human cortex. Exons 16 and 20 of SHANK3 in humans corresponds to exons 17 and 21 of Shank3 in mice. C. Structure and abundance of the fusion transcripts between SHANK3 and ACR in the human cortex. Majority of fusion transcripts are initiated after exon 10, mainly from introns 16, 17, and exon 21. The fusion transcripts are notably skipping exon 20 (the largest exon) of SHANK3 and exon 1 of ACR. D. Validations novel SHANK3 transcripts in in human brain tissue by RT-PCR and Sanger sequencing. Diagram for the primer design of L1 is shown. RT-PCR gel: L1, fusion transcript between SHANK3 exon 20 and ACR exon 2; L2, fusion transcript between SHANK3 exon 20 and ACR exon 4; Lane 3, fusion transcript between SHANK3 exon 19 and ACR exon 2; L 3, novel exon U3; Lane 4; L5, intron14 retention; L6, intron 15 retention. M, DNA marker. Sanger sequence of RT-PCR product of SHANK3 exon 20 and ACR exon 2 fusion from L1 E. Three new exons upstream of the annotated exon 1 of SHANK3 mRNA (NM_001372044) (U1, chr22:50672853–50672979; U2, chr22:50674076–50674097; U3, chr22:50674642–50674705, hg38). A new ATG codon is in U2. F. Dandelion plot shows functional domain combinations of the SHANK1, SHANK2, and SHANK3 transcripts from CIS. Each dot represents a unique transcript, and each color is a unique combination of functional domains. There are 17 combinations of functional domains of human SHANK family genes. The PDZ domain was significantly more present (~70%) in predicted ORFs. G-H. Significant enrichment of fusion transcripts in transcriptome data of ASD and schizophrenia. For Gene Ontology enrichment analysis with Enrichr95 in 41 disease-related datasets. The fusion transcripts were significantly enriched in ASD and schizophrenia in Disease Perturbations form GEO dataset (G) and the ClinVar2019 dataset (H). I-J. Distribution of GERP (G) and PhyloP (H) scores across human SHANK3 genomic regions of known coding exons, novel exons from CIS, and non-transcribed region in cerebral cortex. I. GERP score for novel exons from CIS in cerebral cortex is significantly high than non-transcribed region (D=0.097; p<0.001) but significantly lower than that of SHANK3 known exons (D=0.299; p<0.001). J. PhyloP score for novel exons from CIS in cerebral cortex is significantly higher than non-transcribed region (D=0.133, p<0.001) but significantly lower than that of SHANK3 known coding exons (D=0.296, p<0.001). K-L. Distribution of GERP and PhyloP scores across mouse Shank3 genomic regions of known coding exons, novel exons from CIS, and non-transcribed region in PFC and ST. K. GERP score for novel exons from CIS in PFC and ST is significantly high than that of non-transcribed region (PFC: D=0.548, p<0.001; ST:D=0.602, p<0.001) but significantly lower than known Shank3 coding exons (PFC:0.15, p<0.001; ST:D=0.0960; p<0.001). L. PhyloP score for novel exons from CIS in PFC and ST is significantly higher than that of non-transcribed region (PFC:D=0.385, p<0.001; ST:D=0.439, p<0.001) but significantly lower than known Shank3 coding exons (PFC:D=0.184, p<0.001; ST:D=0.128, P<0.001).
Fig. 6.
Fig. 6.. Developmental, cell type, cell compartment specific, and spatial transcriptome of Shank3 in mouse brains.
A. Developmental specific Shank3 transcripts in mouse cerebral cortex. B. Cell type specific Shank3 transcripts in mouse brains. The scRNA-seq of anterior cingulate cortex (ACA) was aligned to Shank3 transcripts detected by CIS. Glutamatergic neurons, especially the L2/3, L4/5, and L6 CTX, have more diverse Shank3 transcripts compared to GABAergic neuron and non-neuronal cells. Certain transcripts were cell type specific. Shank3 transcript (PB.10607.933) including exon 18 was only detected in endothelial cells. C-F. Mouse Shank3 transcripts in Visium spatial transcriptome. C. Visium spatial anatomy (CA: Cornu Ammonis, DG: Dentate Gyrus, TH: Thalamus, PIR: Piriform cortex, MEA: Medial Amygdala, CP: choroid plexus, CTX: Cortex, HPF: Hippocampal Formation, HY: Hypothalamus). G. Cellular compartment specific changes of Shank3 exon usage in the hippocampus of Alzheimer’s disease (AD) mouse model from scRNA-seq data from different cellular compartment. The nucleus, compared to synapses, expressed significantly fewer splicing events of 32 and 33 that correspond to the exon 21, the largest exon of mouse Shank3. H. Different pattern of Shank3-Acr fusion transcripts in nucleus and synapse between WT and AD mice.
Fig. 7
Fig. 7. Improved transcriptome analysis of ASD transcriptome and sequence variant annotations of genome sequence data using SHANK3 transcript structure from CIS
A-D. The pattern of human SHANK3 transcripts from CIS changed at different ages and brain regions. Bulk RNA-seq data of normal controls was aligned to SHANK3 transcripts detected using CIS (BA, Brodmann area; CBL; cerebellum). E-I. PCA of human SHANK3 transcripts from CIS and bulk RNA-seq data of 2,474 cases with ASD, BPD, MDD), or SCZ, and normal controls from PsychENCODE (only data from prefrontal cortex is included). The clusters of MDD and BPD overlapped but are separate from ASD and SCZ. The volcano plots for individual disorders ASD (n=68), MDD (n=87), BPD (n=297), and SCZ (n=736) compared to controls (n=1,286). J. PCA analysis of SHANK3 transcripts in different brain regions and age (BA, Brodmann area; CBL, cerebellum) K-L: Brain region-specific change of SHANK3 transcripts in ASD brains. Bulk RNA-seq data of subregions of the brain from ASD and controls were aligned to SHANK3 transcripts from CIS. K. Exons 11, 15, 20, and 22 of SHANK3 transcripts were significantly more represented in the BA7 region of ASD. L. Exon 10 of SHANK3 transcripts is significantly more represented in BA38 of ASD brain. M. Utilizing the updated SHANK3 transcript structure from CIS enhanced PTV detection in ASD, SCZ, and BPD exome and genome sequencing data. From 55,000 cases, we identified 1,530 new PTVs, a significant increase from previous annotations using the SHANK3 transcript NM_001372044.2 in hg38. Of these, 192 variants were likely deleterious, including 27 stop-loss, 60 stop-gain, 52 frameshift, and 53 splice variants, compared to the earlier finding of 22 such variants. N. The discovery rate of PTVs for SHANK3 is increased from 1.3% using NM_001372044.2/hg38 as a reference to 12.5% using the transcript structure from CIS in this study.

References

    1. Park E., Pan Z., Zhang Z., Lin L., and Xing Y. (2018). The Expanding Landscape of Alternative Splicing Variation in Human Populations. Am J Hum Genet 102, 11–26. 10.1016/j.ajhg.2017.11.002. - DOI - PMC - PubMed
    1. Blencowe B.J. (2017). The Relationship between Alternative Splicing and Proteomic Complexity. Trends Biochem Sci 42, 407–408. 10.1016/j.tibs.2017.04.001. - DOI - PubMed
    1. Raj B., and Blencowe B.J. (2015). Alternative Splicing in the Mammalian Nervous System: Recent Insights into Mechanisms and Functional Roles. Neuron 87, 14–27. 10.1016/j.neuron.2015.05.004. - DOI - PubMed
    1. Ray T.A., Cochran K., Kozlowski C., Wang J., Alexander G., Cady M.A., Spencer W.J., Ruzycki P.A., Clark B.S., Laeremans A., et al. (2020). Comprehensive identification of mRNA isoforms reveals the diversity of neural cell-surface molecules with roles in retinal development and disease. Nat Commun 11, 3328. 10.1038/s41467-020-17009-7. - DOI - PMC - PubMed
    1. Gandal M.J., Zhang P., Hadjimichael E., Walker R.L., Chen C., Liu S., Won H., van Bakel H., Varghese M., Wang Y., et al. (2018). Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362. 10.1126/science.aat8127. - DOI - PMC - PubMed

Publication types