Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project
- PMID: 38528062
- PMCID: PMC10963753
- DOI: 10.1038/s41598-024-57439-7
Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project
Abstract
Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines.
Keywords: Benchmarking; Bioinformatics; Indel; Precision medicine; Quality control.
© 2024. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.
Conflict of interest statement
N.F.S. and N.N. are employees of Agilent Technologies. Other authors declare no competing interest.
Figures




Similar articles
-
Towards accurate indel calling for oncopanel sequencing through an international pipeline competition at precisionFDA.Sci Rep. 2024 Apr 8;14(1):8165. doi: 10.1038/s41598-024-58573-y. Sci Rep. 2024. PMID: 38589653 Free PMC article.
-
Comparative assessments of indel annotations in healthy and cancer genomes with next-generation sequencing data.BMC Med Genomics. 2020 Nov 10;13(1):170. doi: 10.1186/s12920-020-00818-6. BMC Med Genomics. 2020. PMID: 33167946 Free PMC article.
-
Benchmarking workflows to assess performance and suitability of germline variant calling pipelines in clinical diagnostic assays.BMC Bioinformatics. 2021 Feb 24;22(1):85. doi: 10.1186/s12859-020-03934-3. BMC Bioinformatics. 2021. PMID: 33627090 Free PMC article.
-
Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets.Sci Rep. 2025 Apr 21;15(1):13697. doi: 10.1038/s41598-025-97047-7. Sci Rep. 2025. PMID: 40258889 Free PMC article.
-
Genomic variant benchmark: if you cannot measure it, you cannot improve it.Genome Biol. 2023 Oct 5;24(1):221. doi: 10.1186/s13059-023-03061-1. Genome Biol. 2023. PMID: 37798733 Free PMC article. Review.
Cited by
-
2023 White Paper on Recent Issues in Bioanalysis: Deuterated Drugs; LNP; Tumor/FFPE Biopsy; Targeted Proteomics; Small Molecule Covalent Inhibitors; Chiral Bioanalysis; Remote Regulatory Assessments; Sample Reconciliation/Chain of Custody (PART 1A - Recommendations on Mass Spectrometry, Chromatography, Sample Preparation Latest Developments, Challenges, and Solutions and BMV/Regulated Bioanalysis PART 1B - Regulatory Agencies' Inputs on Regulated Bioanalysis/BMV, Biomarkers/IVD/CDx/BAV, Immunogenicity, Gene & Cell Therapy and Vaccine).Bioanalysis. 2024;16(9):307-364. doi: 10.1080/17576180.2024.2347153. Epub 2024 May 27. Bioanalysis. 2024. PMID: 38913185 Free PMC article.
-
Augmenting precision medicine via targeted RNA-Seq detection of expressed mutations.NPJ Precis Oncol. 2025 Jun 13;9(1):182. doi: 10.1038/s41698-025-00993-8. NPJ Precis Oncol. 2025. PMID: 40514442 Free PMC article.
-
Towards accurate indel calling for oncopanel sequencing through an international pipeline competition at precisionFDA.Sci Rep. 2024 Apr 8;14(1):8165. doi: 10.1038/s41598-024-58573-y. Sci Rep. 2024. PMID: 38589653 Free PMC article.
-
Targeted DNA-seq and RNA-seq of Reference Samples with Short-read and Long-read Sequencing.Sci Data. 2024 Aug 16;11(1):892. doi: 10.1038/s41597-024-03741-y. Sci Data. 2024. PMID: 39152166 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources