Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 29;10(1):18542.
doi: 10.1038/s41598-020-75387-w.

Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes

Affiliations

Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes

Ruoyun Hui et al. Sci Rep. .

Abstract

Although ancient DNA data have become increasingly more important in studies about past populations, it is often not feasible or practical to obtain high coverage genomes from poorly preserved samples. While methods of accurate genotype imputation from > 1 × coverage data have recently become a routine, a large proportion of ancient samples remain unusable for downstream analyses due to their low coverage. Here, we evaluate a two-step pipeline for the imputation of common variants in ancient genomes at 0.05-1 × coverage. We use the genotype likelihood input mode in Beagle and filter for confident genotypes as the input to impute missing genotypes. This procedure, when tested on ancient genomes, outperforms a single-step imputation from genotype likelihoods, suggesting that current genotype callers do not fully account for errors in ancient sequences and additional quality controls can be beneficial. We compared the effect of various genotype likelihood calling methods, post-calling, pre-imputation and post-imputation filters, different reference panels, as well as different imputation tools. In a Neolithic Hungarian genome, we obtain ~ 90% imputation accuracy for heterozygous common variants at coverage 0.05 × and > 97% accuracy at coverage 0.5 ×. We show that imputation can mitigate, though not eliminate reference bias in ultra-low coverage ancient genomes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Schematic representation of the imputation pipeline. The input and output of the starting down-sampling step are both alignment files in BAM format. The output of each step of the pipeline (genotype calling, genotype probability update and genotype imputation) is a VCF file. In the output boxes, data fields that are updated and necessary in the following step of the pipeline are highlighted in green (1KG: 1000 Genomes).
Figure 2
Figure 2
Imputation accuracy of heterozygous sites following the default pipeline evaluated by down-sampling NE1 chr20. The main figure shows the accuracy across coverages (on a log scale, X-axis), with and without the post-calling deamination filter. The inset on the top-right corner shows the proportion of heterozygous sites called in the original 20 × genome that are correctly imputed (i.e. not imputed as homozygous or failing the post-imputation filter).
Figure 3
Figure 3
Comparing performance between one-step and two-step imputation pipelines. Two-step pipelines have a pre-imputation filter applied: max(GP) ≥ 0.99 for Beagle 4.0 + Beagle 5 and Beagle 4.1 + Beagle 5; max(GP) ≥ 0.9 for GLIMPSE + Beagle 5. In the lower panel, post-imputation GP filters are max(GP) ≥ 0.9 for GLIMPSE and max(GP) ≥ 0.99 for all the others. We used a more relaxed cutoff for GPs generated by GLIMPSE because these appear more conservative than GPs generated by Beagle 4 and 5 (Table S3).
Figure 4
Figure 4
Effect of different settings on imputation accuracy evaluated by down-sampling NE1. (A) Performance using different genotype callers in a 0.05 × coverage genome; (B) Effect of pre-imputation filters in a 0.05 × coverage genome; (C) Effect of post-imputation filters at in a 0.05 × coverage genome; (D) Performance using different reference panels during the genotype probability update and imputation steps in a 0.05 × coverage genome. The inset on the top-right corner shows the proportion of heterozygous sites called in the original 20 × genome that are correctly imputed (i.e. not imputed as homozygous or failing the post-imputation filter). tv: transversion.

References

    1. Marciniak S, Perry GH. Harnessing ancient genomes to study the history of human adaptation. Nat. Rev. Genet. 2017;18:659–674. doi: 10.1038/nrg.2017.65. - DOI - PubMed
    1. Li Y, Willer C, Sanna S, Abecasis G. Genotype imputation. Annu. Rev. Genomics Hum. Genet. 2009;10:387–406. doi: 10.1146/annurev.genom.9.081307.164242. - DOI - PMC - PubMed
    1. Das S, Abecasis GR, Browning BL. genotype imputation from large reference panels. Annu. Rev. Genomics Hum. Genet. 2018;19:73–96. doi: 10.1146/annurev-genom-083117-021602. - DOI - PubMed
    1. Li N, Stephens M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics. 2003;165:2213–2233. - PMC - PubMed
    1. Browning BL, Zhou Y, Browning SR. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 2018;103:338–348. doi: 10.1016/j.ajhg.2018.07.015. - DOI - PMC - PubMed

Publication types