Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 30;21(1):844.
doi: 10.1186/s12864-020-07229-y.

Competitive mapping allows for the identification and exclusion of human DNA contamination in ancient faunal genomic datasets

Affiliations

Competitive mapping allows for the identification and exclusion of human DNA contamination in ancient faunal genomic datasets

Tatiana R Feuerborn et al. BMC Genomics. .

Abstract

Background: After over a decade of developments in field collection, laboratory methods and advances in high-throughput sequencing, contamination remains a key issue in ancient DNA research. Currently, human and microbial contaminant DNA still impose challenges on cost-effective sequencing and accurate interpretation of ancient DNA data.

Results: Here we investigate whether human contaminating DNA can be found in ancient faunal sequencing datasets. We identify variable levels of human contamination, which persists even after the sequence reads have been mapped to the faunal reference genomes. This contamination has the potential to affect a range of downstream analyses.

Conclusions: We propose a fast and simple method, based on competitive mapping, which allows identifying and removing human contamination from ancient faunal DNA datasets with limited losses of true ancient data. This method could represent an important tool for the ancient DNA field.

Keywords: Ancient DNA; Competitive mapping; DNA contamination removal; Palaeogenomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Mapping statistics for target, non-target and human references. a Right panel, percentage of reads from each sample mapping to each of the three reference genomes. Left panel, same as before but zoomed to percentages below 1.2%. b Proportion of reads from the faunal BAM file that mapped to the human part of the concatenated reference genome
Fig. 2
Fig. 2
Schematic view of the competitive mapping analyses. FASTQ files represent ‘raw’ sequencing files and BAM files represent alignments to a reference genome. Color boxes indicate different types of data: blue, files that need further processing; red, discarded data; and green, data for downstream analyses. a Schematic view of the analyses performed in this manuscript. An example using a mammoth sample is shown. First, normal mapping to the elephant, human and dog references to check for endogenous content as well as non-target and human contamination in the sequencing files. Second, competitive mapping to a concatenated reference of an elephant and human to detect human contamination in the alignments. Third, normal mapping human data to the elephant reference to check that the human contaminat sequences map preferentially to conserved regions of the genome. b Schematic view of a typical competitive mapping pipeline using a mammoth sample as example. After competitive mapping, only the sequences mapping to the elephant part of the concatenated reference will be used for downstream analyses
Fig. 3
Fig. 3
Characterization of endogenous and human contaminant reads in faunal BAM files. a Comparisons of PMDR and mRL for all mammoth samples. b mRL for mammoth sequences mapping to the elephant or the human parts of the concatenated reference (Wilcoxon rank-sum test, W = 313.5, p-value = 0.00223). c PMDR for mammoth sequences mapping to the elephant or the human parts of the concatenated reference (Wilcoxon rank-sum test, W = 397, p-value = 1.016e-10). d Comparisons of PMDR and mRL for all ancient dog samples. e mRL for dog sequences mapping to the dog or the human parts of the concatenated reference (Wilcoxon rank-sum test, W = 1929, p-value = 1.251e-08). f PMDR for dog sequences mapping to the dog or the human parts of the concatenated reference (Wilcoxon rank-sum test, W = 1743, p-value = 1.511e-05). In all cases, **: p-value < 0.01 and ****: p-value < 0.0001
Fig. 4
Fig. 4
Data lost per sample after competitive mapping. Fraction of data lost in each sample at genome-wide level and only in conserved regions. Colors indicate different species
Fig. 5
Fig. 5
Proportions of sequences mapping to human, target and non-target reference from the FASTQ and BAM files. a Correlation between the proportion of reads mapping to human and to the non-target species in the raw FASTQ sequencing files (r2 = 0.81, F = 303.8, p-value = < 2.2e-16). b Not correlation between the proportion of reads mapping to human in the raw FASTQ sequencing files and the proportion of reads mapping to human from the faunal BAM file (r2 = 0.01, F = 1.67, p-value = 0.2). c Correlation between the number of reads mapping to human in the raw FASTQ sequencing files and the number of reads mapping to human from the faunal BAM file (r2 = 0.15, F = 13.5, p-value = < 2e-16)

References

    1. Lindahl T. Instability and decay of the primary structure of DNA. Nature. 1993;362:709–715. doi: 10.1038/362709a0. - DOI - PubMed
    1. Dabney J, Meyer M, Pääbo S. Ancient DNA damage. Cold Spring Harb Perspect Biol. 2013;5:a012567. - PMC - PubMed
    1. Pääbo S. Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification. Proc Natl Acad Sci U S A. 1989;86:1939–1943. doi: 10.1073/pnas.86.6.1939. - DOI - PMC - PubMed
    1. Kistler L, Ware R, Smith O, Collins M, Allaby RG. A new model for ancient DNA decay based on paleogenomic meta-analysis. Nucleic Acids Res. 2017;45:6310–6320. doi: 10.1093/nar/gkx361. - DOI - PMC - PubMed
    1. Dabney J, Knapp M, Glocke I, Gansauge M-T, Weihmann A, Nickel B, et al. Complete mitochondrial genome sequence of a middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc Natl Acad Sci U S A. 2013;110:15758–15763. doi: 10.1073/pnas.1314445110. - DOI - PMC - PubMed

LinkOut - more resources