Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 7:7:1908.
doi: 10.12688/f1000research.17204.3. eCollection 2018.

Pan-cancer repository of validated natural and cryptic mRNA splicing mutations

Affiliations

Pan-cancer repository of validated natural and cryptic mRNA splicing mutations

Ben C Shirley et al. F1000Res. .

Abstract

We present a major public resource of mRNA splicing mutations validated according to multiple lines of evidence of abnormal gene expression. Likely mutations present in all tumor types reported in the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) were identified based on the comparative strengths of splice sites in tumor versus normal genomes, and then validated by respectively comparing counts of splice junction spanning and abundance of transcript reads in RNA-Seq data from matched tissues and tumors lacking these mutations. The comprehensive resource features 341,486 of these validated mutations, the majority of which (69.9%) are not present in the Single Nucleotide Polymorphism Database (dbSNP 150). There are 131,347 unique mutations which weaken or abolish natural splice sites, and 222,071 mutations which strengthen cryptic splice sites (11,932 affect both simultaneously). 28,812 novel or rare flagged variants (with <1% population frequency in dbSNP) were observed in multiple tumor tissue types. An algorithm was developed to classify variants into splicing molecular phenotypes that integrates germline heterozygosity, degree of information change and impact on expression. The classification thresholds were calibrated against the ClinVar clinical database phenotypic assignments. Variants are partitioned into allele-specific alternative splicing, likely aberrant and aberrant splicing phenotypes. Single variants or chromosome ranges can be queried using a Global Alliance for Genomics and Health (GA4GH)-compliant, web-based Beacon "Validated Splicing Mutations" either separately or in aggregate alongside other Beacons through the public Beacon Network, as well as through our website. The website provides additional information, such as a visual representation of supporting RNAseq results, gene expression in the corresponding normal tissues, and splicing molecular phenotypes.

Keywords: Chromosomes; Genome; Information Theory; Mutation; Neoplasms; Next Generation Sequencing; RNA Splice Sites; Single Nucleotide Polymorphism; Validation.

PubMed Disclaimer

Conflict of interest statement

Competing interests: PKR cofounded and BCS is an employee of CytoGnomix Inc., which hosts the interactive webpage described in this study. CytoGnomix markets subscriptions to and services based on the software that generated the ValidSpliceMut database. EJM has no conflict of interest.

Figures

Figure 1.
Figure 1.. Screenshot of ATM:g.108214098G>T Results Provided By ValidSpliceMut Website.
( A) At the top of the page, the predicted molecular phenotype of the mutation for all cases is presented (prediction algorithm is shown in Figure 2). Then, the ‘Variant Position’ heading displays the variant of interest in g. notation, and provides a link which queries the Mutalyzer API to obtain the variant coordinate in a gene-centric c. mutation format. Variant-specific and splice site-specific tabular results are presented under the headings “Splice Site Information” and “Variant Data”. Results are then organized by TCGA and ICGC sample IDs (‘cases’) harboring the mutation within a series of expandable panels. A link is provided to patient tumor metadata on the GDC data portal. Each panel consists of the molecular phenotype classification for that particular sample, and the read counts and p-values for each Veridical evidence type. Significant p-values (≤ 0.05) are highlighted in bold. Evidence types deemed “strongly corroborating” (Viner et al. 2014) are color coded and correspond to the dynamically generated text appearing above the table. ( B) An integrative genome viewer (IGV) image showing alignment of expressed sequence reads. IGV screenshots are provided only for mutations present <1% of population (in dbSNP 150), with ≥ 5 junction-spanning reads, and are highly significant (p < 0.01) for cryptic splicing, exon skipping, and/or intron inclusion with mutation. A specific IGV screenshot for this sample captures the region surrounding the mutation. Here, several RNA-Seq reads show skipping of the affected exon. ( C) A dynamically generated histogram presents expression levels of all genes for a selected normal tissue type. Genes are grouped into bins based on expression level, denoted on the x-axis. The number of genes present in each bin is shown on the y-axis (log 10 scale). The histogram key indicates the expression range which contains the variant-containing gene (purple). Tissue type can be changed via a drop-down list.
Figure 2.
Figure 2.. Evidence-Based Case Classification Flowchart.
Flowchart depicting steps taken to classify the molecular phenotype of each case. Cryptic site (left) and natural site changes (right) have differing categorization criteria, which involve the combination of information theory-based predictions and Veridical evidence. SNPs common in dbSNP150 (>0.01 average heterozygosity) are immediately considered allele-specific alternative splicing.
Figure 3.
Figure 3.. Census of Recurrent Splicing Mutations Present in Multiple ICGC and TCGA Patients.
Predicted splicing mutations present in multiple tumors from the same dataset that cause splicing abnormalities were analyzed to determine validation rates, since such variants were less subject to technical artifacts, such as sequencing errors. Violin plots indicate the distributions of the fraction of predicted and validated splicing mutations present in multiple patients relative to the total number of tumours carrying those mutations in the TCGA and ICGA datasets. To achieve statistical significance (95% C.I.), distributions of 1,379 validated variants shared by both datasets and present in at least 9 ICGC (left) and 24 TCGA (right) patients were compared. A higher overall proportion of mutations are validated in the ICGC dataset (average of 38.6% for ICGC and 27.8% for TCGA). The dashed lines in each plot indicate the median (middle line), the upper and lower quartiles of the mutation fractions.

Similar articles

Cited by

References

    1. Foley SB, Rios JJ, Mgbemena VE, et al. : Use of Whole Genome Sequencing for Diagnosis and Discovery in the Cancer Genetics Clinic. EBioMedicine. 2015;2(1):74–81. 10.1016/j.ebiom.2014.12.003 - DOI - PMC - PubMed
    1. Richards S, Aziz N, Bale S, et al. : Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–424. 10.1038/gim.2015.30 - DOI - PMC - PubMed
    1. Caminsky N, Mucaki EJ, Rogan PK: Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis [version 1; referees: 2 approved]. F1000Res. 2014;3:282. 10.12688/f1000research.5654.1 - DOI - PMC - PubMed
    1. Viner C, Dorman SN, Shirley BC, et al. : Validation of predicted mRNA splicing mutations using high-throughput transcriptome data [version 2; referees: 4 approved]. F1000Res. 2014;3:8. 10.12688/f1000research.3-8.v2 - DOI - PMC - PubMed
    1. Mucaki EJ, Ainsworth P, Rogan PK: Comprehensive prediction of mRNA splicing effects of BRCA1 and BRCA2 variants. Hum Mutat. 2011;32(7):735–742. 10.1002/humu.21513 - DOI - PubMed

Publication types

Substances

LinkOut - more resources