A comprehensive performance evaluation, comparison, and integration of computational methods for detecting and estimating cross-contamination of human samples in cancer next-generation sequencing analysis
- PMID: 38479675
- DOI: 10.1016/j.jbi.2024.104625
A comprehensive performance evaluation, comparison, and integration of computational methods for detecting and estimating cross-contamination of human samples in cancer next-generation sequencing analysis
Abstract
Cross-sample contamination is one of the major issues in next-generation sequencing (NGS)-based molecular assays. This type of contamination, even at very low levels, can significantly impact the results of an analysis, especially in the detection of somatic alterations in tumor samples. Several contamination identification tools have been developed and implemented as a crucial quality-control step in the routine NGS bioinformatic pipeline. However, no study has been published to comprehensively and systematically investigate, evaluate, and compare these computational methods in the cancer NGS analysis. In this study, we comprehensively investigated nine state-of-the-art computational methods for detecting cross-sample contamination. To explore their application in cancer NGS analysis, we further compared the performance of five representative tools by qualitative and quantitative analyses using in silico and simulated experimental NGS data. The results showed that Conpair achieved the best performance for identifying contamination and predicting the level of contamination in solid tumors NGS analysis. Moreover, based on Conpair, we developed a Python script, Contamination Source Predictor (ConSPr), to identify the source of contamination. We anticipate that this comprehensive survey and the proposed tool for predicting the source of contamination will assist researchers in selecting appropriate cross-contamination detection tools in cancer NGS analysis and inspire the development of computational methods for detecting sample cross-contamination and identifying its source in the future.
Keywords: Bioinformatics; Computational methods; Cross-contamination; Next-generation sequencing.
Copyright © 2024 Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Similar articles
-
Machine learning random forest for predicting oncosomatic variant NGS analysis.Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y. Sci Rep. 2021. PMID: 34750410 Free PMC article.
-
ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research.BMC Bioinformatics. 2016 Feb 2;17:56. doi: 10.1186/s12859-016-0915-y. BMC Bioinformatics. 2016. PMID: 26830926 Free PMC article.
-
Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays.Arch Pathol Lab Med. 2020 Sep 1;144(9):1118-1130. doi: 10.5858/arpa.2019-0476-RA. Arch Pathol Lab Med. 2020. PMID: 32045276 Review.
-
NGS_SNPAnalyzer: a desktop software supporting genome projects by identifying and visualizing sequence variations from next-generation sequencing data.Genes Genomics. 2020 Nov;42(11):1311-1317. doi: 10.1007/s13258-020-00997-7. Epub 2020 Sep 26. Genes Genomics. 2020. PMID: 32980993 Free PMC article.
-
Principles and Validation of Bioinformatics Pipeline for Cancer Next-Generation Sequencing.Clin Lab Med. 2022 Sep;42(3):409-421. doi: 10.1016/j.cll.2022.05.006. Epub 2022 Aug 22. Clin Lab Med. 2022. PMID: 36150820 Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical