Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 30:15:85.
doi: 10.1186/1471-2164-15-85.

Variant calling in low-coverage whole genome sequencing of a Native American population sample

Affiliations

Variant calling in low-coverage whole genome sequencing of a Native American population sample

Chris Bizon et al. BMC Genomics. .

Abstract

Background: The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable.

Results: We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample.

Conclusions: Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sample depth of coverage. Histogram of the mean sequencing read depth per sample for 641 samples. 88% of the samples have mean depth less than 13, and 26% have depth less than 5.
Figure 2
Figure 2
Concordance with exome chip. Concordance between exome chip genotypes and genotypes from three variant callers (a) and false positive rate (b). One point (at depth = 30.4, concordances between 96.7% and 98%) has been removed to expand the data region. The concordance is calculated only at the sites that are measured as non-monomorphic in the exome chip genotypes.
Figure 3
Figure 3
Frequency dependence of site finding. The fraction of variant sites found is dependent on both the frequency range of the variant, and the method used to call variants. The GATK Unified Genotyper in multisample mode finds more variants at all frequency ranges, but the disparity is most pronounced at the lowest frequencies, where the Unified Genotyper finds approximately 50% more variant sites than THUNDER. Single-sample Unified Genotyper calls follow a model that assumes a constant probability of finding any site in a single sample.
Figure 4
Figure 4
Empirical kinship coefficents. Histograms of empirical kinship coefficients calculated from THUNDER genotypes. Each row contains all pairwise values that have the noted value for the pedigree-defined kinship coefficient. Thus, the lowest histogram ( φped = 0.25) contains all full sibling and parent–child relations, the next row up contains grandparent-grandchild, avuncular, and half-sibling relations, and so on.
Figure 5
Figure 5
Allele frequencies in the NA Cohort. a). A two dimensional histogram comparing allele frequencies in the Native American cohort with those in European ancestry samples from 1000 genomes. The variants shown are the union of the two sets. Color scales logarithmically with the number of variants as in the colorbar above the image. b). One dimensional histogram of the difference in allele frequency for the same variants as shown in (a).

References

    1. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;15:661–678. doi: 10.1038/nature05911. - DOI - PMC - PubMed
    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. 2009;15:9362–9367. doi: 10.1073/pnas.0903103106. - DOI - PMC - PubMed
    1. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;15:7–24. doi: 10.1016/j.ajhg.2011.11.029. - DOI - PMC - PubMed
    1. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, Dent KM, Huff CD, Shannon PT, Jabs EW, Nickerson DA, Shendure J, Bamshad MJ. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet. 2010;15:30–35. doi: 10.1038/ng.499. - DOI - PMC - PubMed
    1. Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;15:64–69. - PMC - PubMed

Publication types