Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 30;43(17):8123-34.
doi: 10.1093/nar/gkv803. Epub 2015 Aug 24.

LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations

Affiliations

LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations

Lucas Lochovsky et al. Nucleic Acids Res. .

Abstract

In cancer research, background models for mutation rates have been extensively calibrated in coding regions, leading to the identification of many driver genes, recurrently mutated more than expected. Noncoding regions are also associated with disease; however, background models for them have not been investigated in as much detail. This is partially due to limited noncoding functional annotation. Also, great mutation heterogeneity and potential correlations between neighboring sites give rise to substantial overdispersion in mutation count, resulting in problematic background rate estimation. Here, we address these issues with a new computational framework called LARVA. It integrates variants with a comprehensive set of noncoding functional elements, modeling the mutation counts of the elements with a β-binomial distribution to handle overdispersion. LARVA, moreover, uses regional genomic features such as replication timing to better estimate local mutation rates and mutational hotspots. We demonstrate LARVA's effectiveness on 760 whole-genome tumor sequences, showing that it identifies well-known noncoding drivers, such as mutations in the TERT promoter. Furthermore, LARVA highlights several novel highly mutated regulatory sites that could potentially be noncoding drivers. We make LARVA available as a software tool and release our highly mutated annotations as an online resource (larva.gersteinlab.org).

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) A pie chart representing the distribution of samples in our dataset of collected whole genome sequenced (WGS) cancers. (B) A flowchart of LARVA's procedure for identifying significant highly mutated noncoding elements. Cancer variants in VCF format are passed through quality control filters and then intersected with our noncoding annotation corpus. After factoring in regional mutation rate corrections, a β-binomial distribution is fitted to the observed data, which allows the identification of elements with a significant mutational burden.
Figure 2.
Figure 2.
Mutational heterogeneity between different types of cancer within several prominent classes of noncoding annotations. The percentage of mutations varies widely between noncoding element types, cancer types and samples of the same cancer type.
Figure 3.
Figure 3.
(A) Between samples of the same cancer type, there is huge mutation rate heterogeneity. For most cancers, the mutation rate spans several orders of magnitude. (B) Variation in the mutation rate in fifty 1 Mbp regions across chromosome 1 in lung cancer (top) and prostate cancer (bottom).
Figure 4.
Figure 4.
(A) The β-binomial distribution (pink line) provides better fitting to the observed mutation counts at 10 kb resolution (black line) of 760 cancer genomes, especially at the right tail as compared to the binomial distribution (turquoise line). (B) A comparison of the cumulative distribution function (CDF) of the binomial distribution and the β-binomial distribution from part (A). (C) Boxplots of the Kolmogorov–Smirnov (KS) statistics.
Figure 5.
Figure 5.
The 1 kb genome bins representing the top 10% and bottom 10% of the DNA replication timing were used to derive an observed distribution of mutation counts, demonstrating the influence of replication timing. The fitted binomial and β-binomial distributions are plotted as bar plots (A). P-values at different mutation counts were given by the observed, β-binomial and binomial distribution.
Figure 6.
Figure 6.
(A) The number of significant P-values implied by the β-binomial distribution and the binomial distribution (with and without DNA replication timing correction). (B) A sorted P-value plot of the top significant TSSs derived from the LARVA analysis.
Figure 7.
Figure 7.
Manhattan plot of the P-values from 5000 randomly samples 10 kb bins from the β-binomial distribution (A) and the binomial distribution (B). The binomial distribution might provide heavily inflated P-values due to its inadequacy to capture the extensive overdispersion of the mutation count data.

References

    1. Barbieri C.E., Baca S.C., Lawrence M.S., Demichelis F., Blattner M., Theurillat J.P., White T.A., Stojanov P., Van Allen E., Stransky N., et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 2012;44:685–689. - PMC - PubMed
    1. Baca S.C., Prandi D., Lawrence M.S., Mosquera J.M., Romanel A., Drier Y., Park K., Kitabayashi N., MacDonald T.Y., Ghandi M., et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153:666–677. - PMC - PubMed
    1. Grasso C.S., Wu Y.M., Robinson D.R., Cao X., Dhanasekaran S.M., Khan A.P., Quist M.J., Jing X., Lonigro R.J., Brenner J.C., et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature. 2012;487:239–243. - PMC - PubMed
    1. Shi L., Zhang X., Golhar R., Otieno F.G., He M., Hou C., Kim C., Keating B., Lyon G.J., Wang K., et al. Whole-genome sequencing in an autism multiplex family. Mol. autism. 2013;4:8–22. - PMC - PubMed
    1. Almasy L., Dyer T.D., Peralta J.M., Jun G., Wood A.R., Fuchsberger C., Almeida M.A., Kent J.W., Fowler S., Blackwell T.W., et al. Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees. BMC Proc. 2014;8:S2–S10. - PMC - PubMed

Publication types