Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 24;25(15):8044.
doi: 10.3390/ijms25158044.

A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer

Affiliations

A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer

Rossano Atzeni et al. Int J Mol Sci. .

Abstract

Accurate detection and analysis of somatic variants in cancer involve multiple third-party tools with complex dependencies and configurations, leading to laborious, error-prone, and time-consuming data conversions. This approach lacks accuracy, reproducibility, and portability, limiting clinical application. Musta was developed to address these issues as an end-to-end pipeline for detecting, classifying, and interpreting cancer mutations. Musta is based on a Python command-line tool designed to manage tumor-normal samples for precise somatic mutation analysis. The core is a Snakemake-based workflow that covers all key cancer genomics steps, including variant calling, mutational signature deconvolution, variant annotation, driver gene detection, pathway analysis, and tumor heterogeneity estimation. Musta is easy to install on any system via Docker, with a Makefile handling installation, configuration, and execution, allowing for full or partial pipeline runs. Musta has been validated at the CRS4-NGS Core facility and tested on large datasets from The Cancer Genome Atlas and the Beijing Institute of Genomics. Musta has proven robust and flexible for somatic variant analysis in cancer. It is user-friendly, requiring no specialized programming skills, and enables data processing with a single command line. Its reproducibility ensures consistent results across users following the same protocol.

Keywords: cancer; machine learning; mutational patterns; mutational signatures; precision medicine; somatic mutations; somatic variant detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflicts of interest.

Figures

Figure 1
Figure 1
Performance evaluation of Musta’s detection module on HCC datasets. This figure provides a comprehensive comparison of the variant calling results from the Musta Detection module and the original study by [9]. (a,b) display the execution times for each variant caller across different samples: (a) highlighs the runtime of each variant caller for each sample while (b) shows the average runtime for a single sample. (c,d) illustrate the total counts of pass variants reported by each variant caller: (c) exhibits the number of variants called by each variant caller for each sample while (d) shows the average number of called variants in a single sample per variant caller.
Figure 2
Figure 2
Musta against original study data (Ling [9]). This figure shows a comparison of the number of somatic variants validated by Ling and the results from Musta. (a) heatmap highlights common somatic variants between variant callers and their contribution in Musta results. (b) root mean square error (RMSE) for all variant callers against Ling results, demonstrating Musta’s precision. (c) Venn diagram highlighting the concordance between Musta and Ling analysis, showing nearly 90% overlap, with Musta identifying 137 unique variants and Ling 30. (d) underscores Musta’s robustness, showing that nearly 99% of somatic mutations were identified by at least four out of six variant callers, indicating the high quality and comprehensive coverage of Musta’s ensemble approach.
Figure 3
Figure 3
Performance evaluation of Musta’s classification module on HCC datasets. This figure summarizes the results and efficiency of the Ensemble Variant Effect Predictor (VEP) and GATK’s Funcotator used in the Musta framework. (a,b): Runtime efficiency comparison. VEP annotates a sample in 15 min, while Funcotator takes over 7 h. (c,d): Quantitative outcomes. Funcotator identifies slightly fewer genes than VEP, but offers more detailed classifications per gene.
Figure 4
Figure 4
Comparison of frequently mutated genes (FLAG) classified by VEP and Funcotator. Both tools classify the same set of genes as frequently mutated genes (FLAG), indicating high agreement in identifying genes relevant to hepatocellular carcinoma (HCC).
Figure 5
Figure 5
Performance evaluation of Musta’s interpretation module on HCC and LIHC datasets: frequently mutated genes. The (a) shows the Top 10 mutated genes in TCGA-LIHC dataset, while (b) illustrates a comparison between Top 10 mutated genes in TCGA-LIHC and HCC (Ling) datasets. TTN gene is identified as the most mutated gene in both datasets, aligning with literature on Hepatocellular Carcinoma. The HCC dataset shows mutations in all samples, while the LIHC dataset exhibits variable numbers of samples with mutations, reflecting greater diversity.
Figure 6
Figure 6
Performance evaluation of Musta’s interpretation module on HCC and LIHC datasets: mutational signatures from the SBS databases. Both datasets feature SBS22, linked to Hepatocellular Carcinoma, as the first signature.
Figure 7
Figure 7
Performance evaluation of Musta’s interpretation module on HCC and LIHC datasets: transition and trasnversion ratios. Transition and transversion ratios are more uniformly distributed in the LIHC dataset. In contrast, the HCC dataset shows a predominance of T>A mutations.
Figure 8
Figure 8
Overview of the Musta framework for cancer sample analysis. This figure illustrates the workflow of the Musta framework, which efficiently organizes cancer sample processing tools into three distinct analysis modules: detection, classification, and interpretation. The process begins with input BAM files, which are detailed in the samples.yaml file. Each paired normal and tumor BAM file is sent to one of the six variant callers in the detection module. The VCF files generated by each variant caller are then processed by the Ensemble step (SomaticSeq), which produces a consensus VCF. This consensus VCF is subsequently passed to the Classification module, where two Variant Annotators (VEP and Funcotator) generate an annotated MAF file. This annotated MAF file serves as the input for the final step, the Interpretation module, which generates plots and tables to facilitate data analysis and visualization.
Figure 9
Figure 9
Layered architecture of the Musta framework. The Musta framework is built with a layered architecture, ensuring efficient and organized processing of cancer samples. At its core is a Snakemake-based workflow (version 7.15), encapsulated within a Python framework (version 3.8) and executed in a Docker container (version 20 and later). The Snakemake rules instantiate the necessary Conda (version 4.12) environments to run the individual tools, ensuring compatibility and reproducibility. Users interact with the system through a user-friendly Command-Line Interface (CLI), enabling easy command execution and data input.

Similar articles

References

    1. Mardis E.R. A decade’s perspective on DNA sequencing technology. Nature. 2011;470:1483–1489. doi: 10.1038/nature09796. - DOI - PubMed
    1. Martincorena I., Campbell P.J. Somatic mutation in cancer and normal cells. Science. 2015;349:198–203. doi: 10.1126/science.aab4082. - DOI - PubMed
    1. Vogelstein B., Papadopoulos N., Velculescu V.E., Zhou S., Diaz L.A., Jr., Kinzler K.W. Cancer genome landscapes. Science. 2013;339:1546–1558. doi: 10.1126/science.1235122. - DOI - PMC - PubMed
    1. Garraway L.A. Genomics-driven oncology: Framework for an emerging paradigm. J. Clin. Oncol. 2013;31:1806–1814. doi: 10.1200/JCO.2012.46.8934. - DOI - PubMed
    1. Lawrence M.S., Stojanov P., Polak P., Kryukov G.V., Cibulskis K., Sivachenko A., Getz G. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. doi: 10.1038/nature12213. - DOI - PMC - PubMed

LinkOut - more resources