A framework for the estimation of the proportion of true discoveries in single nucleotide variant detection studies for human data
- PMID: 29694377
- PMCID: PMC5918994
- DOI: 10.1371/journal.pone.0196058
A framework for the estimation of the proportion of true discoveries in single nucleotide variant detection studies for human data
Abstract
Any single nucleotide variant detection study could benefit from a fast and cheap method of measuring the quality of variant call list. It is advantageous to be able to see how the call list quality is affected by different variant filtering thresholds and other adjustments to the study parameters. Here we look into a possibility of estimating the proportion of true positives in a single nucleotide variant call list for human data. Using whole-exome and whole-genome gold standard data sets for training, we focus on building a generic model that only relies on information available from any variant caller. We assess and compare the performance of different candidate models based on their practical accuracy. We find that the generic model delivers decent accuracy most of the time. Further, we conclude that its performance could be improved substantially by leveraging the variant quality metrics that are specific to each variant calling tool.
Conflict of interest statement
Figures








Similar articles
-
Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data.Bioinformatics. 2017 Apr 15;33(8):1147-1153. doi: 10.1093/bioinformatics/btw786. Bioinformatics. 2017. PMID: 28035032 Free PMC article.
-
An efficient and tunable parameter to improve variant calling for whole genome and exome sequencing data.Genes Genomics. 2018 Jan;40(1):39-47. doi: 10.1007/s13258-017-0608-6. Epub 2017 Aug 29. Genes Genomics. 2018. PMID: 29892897
-
BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity.BMC Bioinformatics. 2014 Apr 12;15:104. doi: 10.1186/1471-2105-15-104. BMC Bioinformatics. 2014. PMID: 24725768 Free PMC article.
-
16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model.Gigascience. 2017 Jul 1;6(7):1-4. doi: 10.1093/gigascience/gix045. Gigascience. 2017. PMID: 28637275 Free PMC article.
-
Added Value of Reanalysis of Whole Exome- and Whole Genome Sequencing Data From Patients Suspected of Primary Immune Deficiency Using an Extended Gene Panel and Structural Variation Calling.Front Immunol. 2022 Jun 30;13:906328. doi: 10.3389/fimmu.2022.906328. eCollection 2022. Front Immunol. 2022. PMID: 35874679 Free PMC article. Review.
Cited by
-
Whole-genome sequencing data of Kazakh individuals.BMC Res Notes. 2021 Feb 4;14(1):45. doi: 10.1186/s13104-021-05464-4. BMC Res Notes. 2021. PMID: 33541395 Free PMC article.
References
-
- Pirooznia M, Kramer M, Parla J, Goes FS, Potash JB, McCombie WR, et al. Validation and assessment of variant calling pipelines for next-generation sequencing. Hum Genomics. 2014; 8(1): 14 http://doi.org/10.1186/1479-7364-8-14 - DOI - PMC - PubMed
-
- Yi M, Zhao Y, Jia L, He M, Kebebew E, Stephens RM. Performance comparison of SNP detection tools with Illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data. Nucleic Acids Res. 2014. July;42(12):e101 https://doi.org/10.1093/nar/gku392 - DOI - PMC - PubMed
-
- Liu X, Han S, Wang Z, Gelernter J, Yang B-Z. Variant Callers for Next-Generation Sequencing Data: A Comparison Study. PLoS One. 2013. September 27;8(9):e75619 https://doi.org/10.1371/journal.pone.0075619 - DOI - PMC - PubMed
-
- Cornish A, Guda C. A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference. Biomed Res Int. 2015;2015:456479 http://dx.doi.org/10.1155/2015/456479 - DOI - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous