Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 27;13(1):10424.
doi: 10.1038/s41598-023-37409-1.

Machine learning-based somatic variant calling in cell-free DNA of metastatic breast cancer patients using large NGS panels

Affiliations

Machine learning-based somatic variant calling in cell-free DNA of metastatic breast cancer patients using large NGS panels

Elisabeth M Jongbloed et al. Sci Rep. .

Abstract

Next generation sequencing of cell-free DNA (cfDNA) is a promising method for treatment monitoring and therapy selection in metastatic breast cancer (MBC). However, distinguishing tumor-specific variants from sequencing artefacts and germline variation with low false discovery rate is challenging when using large targeted sequencing panels covering many tumor suppressor genes. To address this, we built a machine learning model to remove false positive variant calls and augmented it with additional filters to ensure selection of tumor-derived variants. We used cfDNA of 70 MBC patients profiled with both the small targeted Oncomine breast panel (Thermofisher) and the much larger Qiaseq Human Breast Cancer Panel (Qiagen). The model was trained on the panels' common regions using Oncomine hotspot mutations as ground truth. Applied to Qiaseq data, it achieved 35% sensitivity and 36% precision, outperforming basic filtering. For 20 patients we used germline DNA to filter for somatic variants and obtained 245 variants in total, while our model found seven variants, of which six were also detected using the germline strategy. In ten tumor-free individuals, our method detected in total one (potentially germline) variant, in contrast to 521 variants detected without our model. These results indicate that our model largely detects somatic variants.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Overview of the experiments and comparison between the two mutation panels. Plasma cfDNA of 70 MBC patients was profiled using both the Oncomine (top) and Qiaseq (bottom) panels. We used the Oncomine calls in the regions shared by both panels to train a SVM classifier that predicts whether a variant called by Qiaseq would have also been called by Oncomine or not. Inputs to the model are features extracted from the variant caller output as well as from the sequence context. Then we applied post-processing filters on both panels to only keep variants that are most likely to be tumor-specific.
Figure 2
Figure 2
Number of COSMIC variants (x-axis) detected by Oncomine and Qiaseq in the common regions between the two panels when we applied (A) no filtering, (B) our rule-based filtering strategy, and (C) our proposed SVM learning model to filter somatic mutations. We show the results for each of the 70 patients (y-axis) when we used the remaining 69 patients to tune our filters. In blue, on the left part of the x-axis, we show the number of Oncomine mutations for each patient. The number of those mutations also detected by Qiaseq is shown on the right as orange bars. Green bars denote the number of additional mutations found by the Qiaseq panel for that sample. The number of variants which are shown, are without applying the post-processing filters.
Figure 3
Figure 3
Number of mutations (x-axis) detected per patient (y-axis) using Oncomine (red), down-sampled Oncomine (gray-black), and Qiaseq (green). All regions covered by each panel are included.
Figure 4
Figure 4
Incidence of mutations per gene detected in our cohort by the Qiaseq and Oncomine panels compared to the incidence of mutations per gene in the TCGA and BASIS cohorts.

References

    1. André F, Ciruelos E, Rubovszky G, Campone M, Loibl S, Rugo HS, et al. Alpelisib for PIK3CA-mutated, hormone receptor-positive advanced breast cancer. N. Engl. J. Med. 2019;380(20):1929–1940. doi: 10.1056/NEJMoa1813904. - DOI - PubMed
    1. O’Leary B, Hrebien S, Morden JP, Beaney M, Fribbens C, Huang X, et al. Early circulating tumor DNA dynamics and clonal selection with palbociclib and fulvestrant for breast cancer. Nat. Commun. 2018;9(1):896. doi: 10.1038/s41467-018-03215-x. - DOI - PMC - PubMed
    1. Nik-Zainal S, Davies H, Staaf J, Ramakrishna M, Glodzik D, Zou X, et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature. 2016;534(7605):47–54. doi: 10.1038/nature17676. - DOI - PMC - PubMed
    1. Angus L, Smid M, Wilting SM, van Riet J, Van Hoeck A, Nguyen L, et al. The genomic landscape of metastatic breast cancer highlights changes in mutation and signature frequencies. Nat. Genet. 2019;51(10):1450–1458. doi: 10.1038/s41588-019-0507-7. - DOI - PMC - PubMed
    1. Razavi P, Chang MT, Xu G, Bandlamudi C, Ross DS, Vasan N, et al. The genomic landscape of endocrine-resistant advanced breast cancers. Cancer Cell. 2018;34(3):427–38 e6. doi: 10.1016/j.ccell.2018.08.008. - DOI - PMC - PubMed

Publication types

Substances