Best practices for benchmarking germline small-variant calls in human genomes

Peter Krusche¹, Len Trigg², Paul C Boutros³, Christopher E Mason^{4

5

6

7}, Francisco M De La Vega⁸, Benjamin L Moore¹, Mar Gonzalez-Porta¹, Michael A Eberle⁹, Zivana Tezak¹⁰, Samir Lababidi¹¹, Rebecca Truty¹², George Asimenos¹³, Birgit Funke¹⁴, Mark Fleharty¹⁵, Brad A Chapman¹⁶, Marc Salit¹⁷, Justin M Zook¹⁸; Global Alliance for Genomics and Health Benchmarking Team

Affiliations

¹ Illumina Cambridge Ltd, Little Chesterford, UK.
² Real Time Genomics, Hamilton, New Zealand.
³ Ontario Institute for Cancer Research, Toronto, Ontario, Canada.
⁴ Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.
⁵ The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
⁶ The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
⁷ The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
⁸ Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA.
⁹ Illumina Inc., San Diego, CA, USA.
¹⁰ Center for Devices and Radiological Health, FDA, Silver Spring, MD, USA.
¹¹ Office of Health Informatics, Office of the Commissioner, FDA, Silver Spring, MD, USA.
¹² Invitae, San Francisco, CA, USA.
¹³ DNAnexus, San Francisco, CA, USA.
¹⁴ Veritas Genetics, Danvers, MA, USA.
¹⁵ Broad Institute, Cambridge, MA, USA.
¹⁶ Bioinformatics Core, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
¹⁷ Joint Initiative for Metrology in Biology, Stanford University, Stanford, CA, USA.
¹⁸ Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA. jzook@nist.gov.

PMID: 30858580
PMCID: PMC6699627
DOI: 10.1038/s41587-019-0054-x

Best practices for benchmarking germline small-variant calls in human genomes

Peter Krusche et al. Nat Biotechnol. 2019 May.

. 2019 May;37(5):555-560.

doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.

Authors

Affiliations

¹ Illumina Cambridge Ltd, Little Chesterford, UK.
² Real Time Genomics, Hamilton, New Zealand.
³ Ontario Institute for Cancer Research, Toronto, Ontario, Canada.
⁴ Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.
⁵ The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
⁶ The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
⁷ The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
⁸ Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA.
⁹ Illumina Inc., San Diego, CA, USA.
¹⁰ Center for Devices and Radiological Health, FDA, Silver Spring, MD, USA.
¹¹ Office of Health Informatics, Office of the Commissioner, FDA, Silver Spring, MD, USA.
¹² Invitae, San Francisco, CA, USA.
¹³ DNAnexus, San Francisco, CA, USA.
¹⁴ Veritas Genetics, Danvers, MA, USA.
¹⁵ Broad Institute, Cambridge, MA, USA.
¹⁶ Bioinformatics Core, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
¹⁷ Joint Initiative for Metrology in Biology, Stanford University, Stanford, CA, USA.
¹⁸ Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA. jzook@nist.gov.

PMID: 30858580
PMCID: PMC6699627
DOI: 10.1038/s41587-019-0054-x

Erratum in

Author Correction: Best practices for benchmarking germline small-variant calls in human genomes.
Krusche P, Trigg L, Boutros PC, Mason CE, De La Vega FM, Moore BL, Gonzalez-Porta M, Eberle MA, Tezak Z, Lababidi S, Truty R, Asimenos G, Funke B, Fleharty M, Chapman BA, Salit M, Zook JM; Global Alliance for Genomics and Health Benchmarking Team. Krusche P, et al. Nat Biotechnol. 2019 May;37(5):567. doi: 10.1038/s41587-019-0108-0. Nat Biotechnol. 2019. PMID: 30899106

Abstract

Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results.

PubMed Disclaimer

Figures

**Figure 1:**
The GA4GH Benchmarking Team’s reference implementation of a comparison framework, annotated with free-floating text describing the team’s innovations. The framework takes in a Truth VCF, Query VCF, confident call regions for the Truth and/or Query, and optionally BED files to stratify performance by genome context. A standardized intermediate output (VCF-I) from the comparison engines allows them to be interchanged and for TP, FP, and FN to be quantified in a standard way.

**Fig. 2:**
Four examples of cases where variants can be represented in multiple forms in VCF format. (a) Three representations of a deletion in a homopolymer. (b) The insertion can be represented as one 4-bp insertion or two 2-bp insertions. (c) An MNP can be represented as 3 SNVs or one larger substitution. (d) Four different representations of a complex variant. Note that representations include phasing information in these examples where it is necessary to unambiguously describe the variant. If phasing was not described for these variants, it would impossible to normalize their representations, but our sophisticated variant comparison tools can determine that they could describe the same two haplotypes.

**Figure 3:**
Example standardized HTML report output from hap.py. (a) Tier 1 high-level metrics output in the default view. (b) Tier 2 more detailed metrics and stratifications by variant type and genome context. (c) Precision-recall curve using QUAL field, where the black point is all indels, the blue point is only PASS indels, the dotted blue line is the precision-recall curve for all indels, and the solid blue line is the precision-recall curve for PASS indels.

**Figure 4:**
Matching stringency can affect relative performance of algorithms. Number of false positives for two PrecisionFDA Challenge submissions is shown for different matching stringencies, showing that the fermikit submission has many more false positives if genotype errors are counted as FPs, but that it has fewer FPs if matching only the allele or performing distance-based matching. Note that this is intended to illustrate the importance of matching stringency and is likely not indicative of the performance of these methods with optimized parameters or current versions.

See this image and copyright information in PMC

References

1. Yang Y et al. Molecular Findings Among Patients Referred for Clinical Whole-Exome Sequencing. JAMA 312, 1870 (2014). - PMC - PubMed
1. Xue Y, Ankala A, Wilcox WR & Hegde MR Solving the molecular diagnostic testing conundrum for Mendelian disorders in the era of next-generation sequencing: single-gene, gene panel, or exome/genome sequencing. Genet. Med. 17, 444–451 (2015). - PubMed
1. Zook JM et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–51 (2014). - PubMed
1. Zook JM et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016). - PMC - PubMed
1. Eberle MA et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. (2016). doi: 10.1101/gr.210500.116 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

9999-NIST/ImNIST/Intramural NIST DOC/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Best practices for benchmarking germline small-variant calls in human genomes

Affiliations

Best practices for benchmarking germline small-variant calls in human genomes

Authors

Affiliations

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources