Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 25;13(4):e0196058.
doi: 10.1371/journal.pone.0196058. eCollection 2018.

A framework for the estimation of the proportion of true discoveries in single nucleotide variant detection studies for human data

Affiliations

A framework for the estimation of the proportion of true discoveries in single nucleotide variant detection studies for human data

Nik Tuzov. PLoS One. .

Abstract

Any single nucleotide variant detection study could benefit from a fast and cheap method of measuring the quality of variant call list. It is advantageous to be able to see how the call list quality is affected by different variant filtering thresholds and other adjustments to the study parameters. Here we look into a possibility of estimating the proportion of true positives in a single nucleotide variant call list for human data. Using whole-exome and whole-genome gold standard data sets for training, we focus on building a generic model that only relies on information available from any variant caller. We assess and compare the performance of different candidate models based on their practical accuracy. We find that the generic model delivers decent accuracy most of the time. Further, we conclude that its performance could be improved substantially by leveraging the variant quality metrics that are specific to each variant calling tool.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The author has read the journal’s policy and the author of this manuscript has the following competing interests: Nik Tuzov holds full time, paid employment at Partek Incorporated. The commercial software produced by Partek Incorporated (“Partek Flow” and “Partek Genomics Suite”) was used extensively to obtain the results presented in the manuscript. This affiliation does not alter the author’s adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Weighted residuals for model (11, 6) from Table 1.
Fig 2
Fig 2. Hat values for model (11, 6) from Table 1.
Fig 3
Fig 3. Cook’s distances for model (11, 6) from Table 1.
Fig 4
Fig 4. Relationship between Ti/Tv and Het/Hom.
Red, blue and green dots denote Nextera, TrueSeq, and WGS observations, correspondingly. Here one can see an outlying blue point obtained with TrueSeq and Freebayes.
Fig 5
Fig 5. Relationship between AIC and lambda for model (11,6).
AIC values for model (11, 6) are plotted against the parameter lambda used in the variance link function in formula (5). The value of lambda equal to zero corresponds to the log link function in formula (4).
Fig 6
Fig 6. Distribution of the length of 95% PI for PPV.
Because PPV is a proportion, the PI length is measured in %. A box plot of length distribution is provided for each model from Table 1.
Fig 7
Fig 7. Relationship between PPV and Het/Hom.
Fig 8
Fig 8. Relationship between PPV and Ti/Tv.

Similar articles

Cited by

  • Whole-genome sequencing data of Kazakh individuals.
    Kairov U, Molkenov A, Rakhimova S, Kozhamkulov U, Sharip A, Karabayev D, Daniyarov A, H Lee J, D Terwilliger J, Akilzhanova A, Zhumadilov Z. Kairov U, et al. BMC Res Notes. 2021 Feb 4;14(1):45. doi: 10.1186/s13104-021-05464-4. BMC Res Notes. 2021. PMID: 33541395 Free PMC article.

References

    1. Koire A, Katsonis P, Lichtarge O. Repurposing germline exomes of the cancer genome atlas demands a cautious approach and sample-specific variant filtering. Pac Symp Biocomput. 2016;21:207–18. - PMC - PubMed
    1. Pirooznia M, Kramer M, Parla J, Goes FS, Potash JB, McCombie WR, et al. Validation and assessment of variant calling pipelines for next-generation sequencing. Hum Genomics. 2014; 8(1): 14 http://doi.org/10.1186/1479-7364-8-14 - DOI - PMC - PubMed
    1. Yi M, Zhao Y, Jia L, He M, Kebebew E, Stephens RM. Performance comparison of SNP detection tools with Illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data. Nucleic Acids Res. 2014. July;42(12):e101 https://doi.org/10.1093/nar/gku392 - DOI - PMC - PubMed
    1. Liu X, Han S, Wang Z, Gelernter J, Yang B-Z. Variant Callers for Next-Generation Sequencing Data: A Comparison Study. PLoS One. 2013. September 27;8(9):e75619 https://doi.org/10.1371/journal.pone.0075619 - DOI - PMC - PubMed
    1. Cornish A, Guda C. A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference. Biomed Res Int. 2015;2015:456479 http://dx.doi.org/10.1155/2015/456479 - DOI - PMC - PubMed

Publication types

LinkOut - more resources