Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 20;22(4):bbaa366.
doi: 10.1093/bib/bbaa366.

Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches

Affiliations

Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches

Shatha Alosaimi et al. Brief Bioinform. .

Abstract

Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.

Keywords: DNA sequence; genomics; next-generation sequence; simulation; variant calling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the variant calling analysis pipeline.
Figure 2
Figure 2
Relationship between positive predictive value (PPV) and sensitivity of variant calling tools on African and European genomic data of different coverages. VarScan2 (pink), Samtools (sky blue), GATK-HaplotypeCaller (red), SNver (dark blue), BCFtools (yellow), LoFreq (purple), Platypus (marron) and VarDict (green).

References

    1. Koboldt DC, Zhang Q, Larson DE, et al. . VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 2012;22:568–76. - PMC - PubMed
    1. Shen T, Pajoro-Van de Stadt SH, Yeat NC, et al. . Clinical applications of next generation sequencing in cancer: from panels, to exomes, to genomes. Front Genet 2015;6:1–9. - PMC - PubMed
    1. Pabinger S, Dander A, Fischer M, et al. . A survey of tools for variant analysis of next-generation genome sequencing data. Brief Bioinform 2012;15(2):256–78. - PMC - PubMed
    1. Bao R, Huang L, Ndrade J, et al. . Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Inform 2014;13(2):67–82. - PMC - PubMed
    1. Spencer DH, Tyagi M, Vallania F, et al. . Performance of common analysis methods for detecting low-frequency single nucleotide variants in targeted next-generation sequence data. J Mol Diagn 2014;16:75–88. - PMC - PubMed

Publication types