Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 17;21(1):250.
doi: 10.1186/s13059-020-02160-7.

Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph

Affiliations

Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph

Rui Martiniano et al. Genome Biol. .

Abstract

Background: During the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA molecules are short and frequently mutated by post-mortem chemical modifications. These features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Alternative approaches have been developed to replace the linear reference with a variation graph which includes known alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for aDNA and compare with existing methods.

Results: We use vg to align simulated and real aDNA samples to a variation graph containing 1000 Genome Project variants and compare with the same data aligned with bwa to the human linear reference genome. Using vg leads to a balanced allelic representation at polymorphic sites, effectively removing reference bias, and more sensitive variant detection in comparison with bwa, especially for insertions and deletions (indels). Alternative approaches that use relaxed bwa parameter settings or filter bwa alignments can also reduce bias but can have lower sensitivity than vg, particularly for indels.

Conclusions: Our findings demonstrate that aligning aDNA sequences to variation graphs effectively mitigates the impact of reference bias when analyzing aDNA, while retaining mapping sensitivity and allowing detection of variation, in particular indel variation, that was previously missed.

Keywords: Ancient DNA; Reference bias; Sequence alignment; Variation graph.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Sequence tube maps. Sequence tube maps [19] of a small region of the human genome with aDNA reads from the Yamnaya individual aligned with abwa aln to a linear reference sequence and bvg map to a graph containing 1000 Genomes variants. The individual is heterozygous for both an indel (GTTTGAG/-) and a SNP (A/C) in this region, with insertion and alternate allele on the same haplotype. The two underlying haplotypes in this region are colored in gray, and red and blue lines indicate forward and reverse reads, respectively. None of the 6 reads across the insertion and only 2 of 12 reads across the SNP were mapped by bwa. Reads were locally realigned with vg map to the graph for the purpose of visualization
Fig. 2
Fig. 2
Comparing vg graph, bwa aln, and bwa mem using simulated ancient DNA. Comparing bwa aln and vg map performance when aligning reads simulated from chromosome 11 of the Human Origins panel. Lines represent ordinary least squares (OLS) regression results for the allele/aligner conditions corresponding to their colors. a Comparison between vg graph and bwa aln -n 0.02. b Comparison between vg graph and bwa aln -n 0.01 -o 2. c Comparison of the mean percentage (and 95% CI) of mapped reads in simulated data by vg graph, bwa aln, and bwa mem using different alignment parameters and minimum mapping quality filtering thresholds. d Mean alternate allele fraction (and 95% CI) of simulated reads after alignment with the different methods and minimum mapping quality filtering thresholds. We also show results obtained after processing simulated data with two previously published workflows for addressing reference bias: modified reads (“modreads”) [10] and modified reference genome (“altref genome”) [15]
Fig. 3
Fig. 3
Downsampling a high-coverage aDNA sample. The comparative effect of downsampling on heterozygous variant calling following bwa aln and vg map alignment of reads from the ancient Yamnaya sample [26] with different parameters and mapping quality filtering thresholds, and including post-processing of bwa aln with the modified read filter [10]. a SNPs. b Indels (the modified read filter does not apply in this case)
Fig. 4
Fig. 4
Comparison between vg and bwa aln for indel detection. a Alternate allele observations at indels. b Comparison between vg graph and bwa aln in the detection of the CCR5 delta 32 deletion associated with HIV-1 resistance. Reads containing the deletion were mapped with vg in four ancient samples, but not with bwa

References

    1. Dabney J, Meyer M, Pääbo S. Ancient DNA damage. Cold Spring Harbor Perspect Biol. 2013;5(7):012567. doi: 10.1101/cshperspect.a012567. - DOI - PMC - PubMed
    1. Rasmussen M, Li Y, Lindgreen S, Pedersen JS, Albrechtsen A, Moltke I, Metspalu M, Metspalu E, Kivisild T, Gupta R, et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature. 2010;463(7282):757. doi: 10.1038/nature08835. - DOI - PMC - PubMed
    1. Brunson K, Reich D. The promise of paleogenomics beyond our own species. Trends Genet. 2019. 10.1016/j.tig.2019.02.006. - PubMed
    1. Günther T, Jakobsson M. Genes mirror migrations and cultures in prehistoric Europe—a population genomic perspective. Curr Opin Genet Dev. 2016;41:115–23. doi: 10.1016/j.gde.2016.09.004. - DOI - PubMed
    1. Skoglund P, Mathieson I. Ancient genomics of modern humans: the first decade. Ann Rev Genom Hum Genet. 2018;19:381–404. doi: 10.1146/annurev-genom-083117-021749. - DOI - PubMed

Publication types

LinkOut - more resources