Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Feb;24(2):300-9.
doi: 10.1101/gr.156224.113. Epub 2013 Nov 8.

Discovery of recurrent structural variants in nasopharyngeal carcinoma

Affiliations

Discovery of recurrent structural variants in nasopharyngeal carcinoma

Anton Valouev et al. Genome Res. 2014 Feb.

Abstract

We present the discovery of genes recurrently involved in structural variation in nasopharyngeal carcinoma (NPC) and the identification of a novel type of somatic structural variant. We identified the variants with high complexity mate-pair libraries and a novel computational algorithm specifically designed for tumor-normal comparisons, SMASH. SMASH combines signals from split reads and mate-pair discordance to detect somatic structural variants. We demonstrate a >90% validation rate and a breakpoint reconstruction accuracy of 3 bp by Sanger sequencing. Our approach identified three in-frame gene fusions (YAP1-MAML2, PTPLB-RSRC1, and SP3-PTK2) that had strong levels of expression in corresponding NPC tissues. We found two cases of a novel type of structural variant, which we call "coupled inversion," one of which produced the YAP1-MAML2 fusion. To investigate whether the identified fusion genes are recurrent, we performed fluorescent in situ hybridization (FISH) to screen 196 independent NPC cases. We observed recurrent rearrangements of MAML2 (three cases), PTK2 (six cases), and SP3 (two cases), corresponding to a combined rate of structural variation recurrence of 6% among tested NPC tissues.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) SMASH workflow illustrated by comparing a hypothetical somatic structural variant (left portion of the figure) and a hypothetical germline structural variant (right portion of the figure). The region that is deleted is pointed out by a black cross. Arrows connected by dashed lines represent read pairs, in which reads of concordant pairs have the same color and those of discordant pairs have a different color. Different colors also represent different genomic regions that have been juxtaposed by a structural variant. (B) Step 1: SMASH eliminates concordant pairs and retains discordant pairs, and groups discordant read pairs from the tumor sample into bundles (contoured by gray lines) based on proximity of underlying read coordinates and consistency of orientations. (C) Step 2: Approximate coordinates of breakpoints are derived from read bundles; then each normal read pair is compared to tumor breakpoints and all breakpoints that have normal read pairs supporting them are eliminated. (D) Step 3: Sequencing reads are split and ends are mapped independently; discordant split read coordinates are used to further refine breakpoints.
Figure 2.
Figure 2.
Examples of structural variants detected within NPC-5989. (A) Schematic representation of chromosomal rearrangements affecting Chr1 (blue) and Chr8 (green). Coverage plots demonstrate an increase in copy number of Chr1q and a reduction of copy number of Chr8q. Each data point represents the log ratio of tumor read counts to normal read counts in 5-kb bins across the chromosome. Red lines represent mean coverage corresponding to regions with different copy numbers. Based on the ratio levels, we estimate that tumor content is 56% (based on amplification of Chr1q) and 64% (based on deletion of Chr8q). The magnified regions contain two breakpoints representing duplicative translocation and deletion (shown as linked red arrows), supported by 204 and 145 read pairs, respectively. Coordinates of breakpoints match closely with positions of copy number changes both on Chr1 and Chr8, supporting the same rearrangement event. Deletion breakpoint coordinates also match coordinates of the region on Chr1 where coverage visibly drops. The duplicative translocation results in a region of LOH 18 Mb in size at the end of Chr8, and in the amplification of much of Chr1q. (B,C) Results of the array CGH analysis of NPC-5989 on Chr1 and Chr8. The x-axis represents genomic coordinates, whereas the y-axis represents probe saturation which is converted into copy number calls. (D) Coupled inversion involving 6-Mb and 0.1-Mb regions on Chr11, which results in the YAP1-MAML2 gene fusion product. The coupled inversion is represented by three breakpoints (linked red arrows). (E) Coupled inversion involving 0.7-Mb and 4.9-Mb regions on Chr1.
Figure 3.
Figure 3.
Validation of somatic structural breakpoints by PCR (images were inverted for better presentation). (A) Agarose gel of PCR products amplified from genomic DNA, targeting breakpoints detected by SMASH in NPC-5989. Because the breakpoints are somatic, specific PCR bands only occur in the tumor sample (pointed out by red arrows). (T) tumor sample; (N) matched normal sample from blood; (W) no-DNA control; (L) 1 kb plus ladder. (B) Agarose gel of PCR products amplified by RT-PCR on tumor RNA corresponding to somatic gene fusions in NPC-5989 and NPC-5421. RT primers were ∼200 bp downstream from the fusion points. PCR primers were within 150 bp upstream of and downstream from the fusion points to specifically amplify across it. (T) tumor; (C) control sample (different tumor); (L) 1 kb plus ladder. Specific products within the expected size range are pointed out by red arrows.
Figure 4.
Figure 4.
YAP1-MAML2 gene fusion protein domains. (A) Structural domains of the YAP1 gene include the TEAD1-interaction domain (amino acids 50–171) and two WW1 protein interaction domains with unknown function. MAML2 contains a Notch-interaction domain (somewhere within amino acids 1–172), which is responsible for MAML2 transactivation function in the presence of Notch signaling. Transactivation domain of MAML2 is located somewhere within amino acids 172–1156. The two genes are fused at the amino acid 191 of YAP1 gene and 172 of MAML2 gene. The resulting gene (B) contains the TEAD1-interaction domain and a truncated WW1 domain from YAP1 and the transactivation domain from MAML2. (C) Under the proposed model, the fusion protein is recruited via TEAD1 binding to target genes of TEAD1, many of which are important in embryonic stem cells (ESCs). Because the MAML2 Notch interaction domain is absent, transactivation of ESC target genes may occur constitutively, possibly leading to dedifferentiation or proliferation.
Figure 5.
Figure 5.
Recurrent rearrangements detected by fluorescent in situ hybridization of tissue microarrays. (A) Rearrangement of MAML2 in case NPC-22525. Probes were selected to flank centromeric (green probe) and telomeric (red probe) ends of the MAML2 gene. FISH images show cells with abnormal alleles. Approximate tumor cell boundaries are outlined with yellow dashed lines. Non-rearranged MAML2 alleles show colocalization of green and red probes (yellow arrows). Isolated green (or red) probes represent rearranged alleles and are marked by green (or red) arrows. (B) Two cases showing rearrangements of PTK2. (C) Two cases with a rearranged SP3 gene. NPC-22480 represents a core of a sequenced sample NPC-5421, which contains a rearranged SP3 allele, corresponding to the isolated centromeric probe (green).

References

    1. Abyzov A, Gerstein M 2011. AGE: Defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision. Bioinformatics 27: 595–603 - PMC - PubMed
    1. Alkan C, Coe BP, Eichler EE 2011. Genome structural variation discovery and genotyping. Nat Rev Genet 12: 363–376 - PMC - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 - PMC - PubMed
    1. Battifora H 1986. The multitumor (sausage) tissue block: Novel method for immunohistochemical antibody testing. Lab Invest 55: 244–249 - PubMed
    1. Chiang DY, Getz G, Jaffe DB, O'Kelly MJ, Zhao X, Carter SL, Russ C, Nusbaum C, Meyerson M, Lander ES 2009. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat Methods 6: 99–103 - PMC - PubMed

MeSH terms