. 2024 Jul 24;25(15):8044.

doi: 10.3390/ijms25158044.

A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer

Rossano Atzeni¹, Matteo Massidda², Enrico Pieroni¹, Vincenzo Rallo³, Massimo Pisu¹, Andrea Angius³

Affiliations

¹ Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy.
² Department of Medical, Surgical and Experimental Sciences, University of Sassari, 07100 Sassari, Italy.
³ Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy.

PMID: 39125613
PMCID: PMC11311285
DOI: 10.3390/ijms25158044

A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer

Rossano Atzeni et al. Int J Mol Sci. 2024.

. 2024 Jul 24;25(15):8044.

doi: 10.3390/ijms25158044.

Authors

Rossano Atzeni¹, Matteo Massidda², Enrico Pieroni¹, Vincenzo Rallo³, Massimo Pisu¹, Andrea Angius³

Affiliations

¹ Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy.
² Department of Medical, Surgical and Experimental Sciences, University of Sassari, 07100 Sassari, Italy.
³ Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy.

PMID: 39125613
PMCID: PMC11311285
DOI: 10.3390/ijms25158044

Abstract

Accurate detection and analysis of somatic variants in cancer involve multiple third-party tools with complex dependencies and configurations, leading to laborious, error-prone, and time-consuming data conversions. This approach lacks accuracy, reproducibility, and portability, limiting clinical application. Musta was developed to address these issues as an end-to-end pipeline for detecting, classifying, and interpreting cancer mutations. Musta is based on a Python command-line tool designed to manage tumor-normal samples for precise somatic mutation analysis. The core is a Snakemake-based workflow that covers all key cancer genomics steps, including variant calling, mutational signature deconvolution, variant annotation, driver gene detection, pathway analysis, and tumor heterogeneity estimation. Musta is easy to install on any system via Docker, with a Makefile handling installation, configuration, and execution, allowing for full or partial pipeline runs. Musta has been validated at the CRS4-NGS Core facility and tested on large datasets from The Cancer Genome Atlas and the Beijing Institute of Genomics. Musta has proven robust and flexible for somatic variant analysis in cancer. It is user-friendly, requiring no specialized programming skills, and enables data processing with a single command line. Its reproducibility ensures consistent results across users following the same protocol.

Keywords: cancer; machine learning; mutational patterns; mutational signatures; precision medicine; somatic mutations; somatic variant detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflicts of interest.

Figures

**Figure 1**
Performance evaluation of *Musta*’s detection module on HCC datasets. This figure provides a comprehensive comparison of the variant calling results from the *Musta* Detection module and the original study by [9]. (a,b) display the execution times for each variant caller across different samples: (a) highlighs the runtime of each variant caller for each sample while (b) shows the average runtime for a single sample. (c,d) illustrate the total counts of *pass* variants reported by each variant caller: (c) exhibits the number of variants called by each variant caller for each sample while (d) shows the average number of called variants in a single sample per variant caller.

**Figure 3**
Performance evaluation of *Musta*’s classification module on HCC datasets. This figure summarizes the results and efficiency of the Ensemble Variant Effect Predictor (VEP) and GATK’s Funcotator used in the *Musta* framework. (a,b): Runtime efficiency comparison. VEP annotates a sample in 15 min, while Funcotator takes over 7 h. (c,d): Quantitative outcomes. Funcotator identifies slightly fewer genes than VEP, but offers more detailed classifications per gene.

**Figure 4**
Comparison of frequently mutated genes (FLAG) classified by VEP and Funcotator. Both tools classify the same set of genes as frequently mutated genes (FLAG), indicating high agreement in identifying genes relevant to hepatocellular carcinoma (HCC).

**Figure 5**
Performance evaluation of *Musta*’s interpretation module on HCC and LIHC datasets: frequently mutated genes. The (a) shows the Top 10 mutated genes in TCGA-LIHC dataset, while (b) illustrates a comparison between Top 10 mutated genes in TCGA-LIHC and HCC (Ling) datasets. *TTN* gene is identified as the most mutated gene in both datasets, aligning with literature on Hepatocellular Carcinoma. The HCC dataset shows mutations in all samples, while the LIHC dataset exhibits variable numbers of samples with mutations, reflecting greater diversity.

**Figure 6**
Performance evaluation of *Musta*’s interpretation module on HCC and LIHC datasets: mutational signatures from the SBS databases. Both datasets feature SBS22, linked to Hepatocellular Carcinoma, as the first signature.

**Figure 7**
Performance evaluation of *Musta*’s interpretation module on HCC and LIHC datasets: transition and trasnversion ratios. Transition and transversion ratios are more uniformly distributed in the LIHC dataset. In contrast, the HCC dataset shows a predominance of T>A mutations.

**Figure 8**
Overview of the *Musta* framework for cancer sample analysis. This figure illustrates the workflow of the *Musta* framework, which efficiently organizes cancer sample processing tools into three distinct analysis modules: detection, classification, and interpretation. The process begins with input BAM files, which are detailed in the samples.yaml file. Each paired normal and tumor BAM file is sent to one of the six variant callers in the detection module. The VCF files generated by each variant caller are then processed by the Ensemble step (SomaticSeq), which produces a consensus VCF. This consensus VCF is subsequently passed to the Classification module, where two Variant Annotators (VEP and Funcotator) generate an annotated MAF file. This annotated MAF file serves as the input for the final step, the Interpretation module, which generates plots and tables to facilitate data analysis and visualization.

**Figure 9**
Layered architecture of the *Musta* framework. The *Musta* framework is built with a layered architecture, ensuring efficient and organized processing of cancer samples. At its core is a Snakemake-based workflow (version 7.15), encapsulated within a Python framework (version 3.8) and executed in a Docker container (version 20 and later). The Snakemake rules instantiate the necessary Conda (version 4.12) environments to run the individual tools, ensuring compatibility and reproducibility. Users interact with the system through a user-friendly Command-Line Interface (CLI), enabling easy command execution and data input.

See this image and copyright information in PMC

References

1. Mardis E.R. A decade’s perspective on DNA sequencing technology. Nature. 2011;470:1483–1489. doi: 10.1038/nature09796. - DOI - PubMed
1. Martincorena I., Campbell P.J. Somatic mutation in cancer and normal cells. Science. 2015;349:198–203. doi: 10.1126/science.aab4082. - DOI - PubMed
1. Vogelstein B., Papadopoulos N., Velculescu V.E., Zhou S., Diaz L.A., Jr., Kinzler K.W. Cancer genome landscapes. Science. 2013;339:1546–1558. doi: 10.1126/science.1235122. - DOI - PMC - PubMed
1. Garraway L.A. Genomics-driven oncology: Framework for an emerging paradigm. J. Clin. Oncol. 2013;31:1806–1814. doi: 10.1200/JCO.2012.46.8934. - DOI - PubMed
1. Lawrence M.S., Stojanov P., Polak P., Kryukov G.V., Cibulskis K., Sivachenko A., Getz G. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218. doi: 10.1038/nature12213. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer

Affiliations

A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical