Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs

Nadin Rohland^#^{1

2}, Swapan Mallick^#^{1

2

3}, Matthew Mah^{1

2

3}, Robert Maier^{1

2

4}, Nick Patterson^{2

4}, David Reich^{1

2

3

4}

Affiliations

¹ Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
² Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
³ Howard Hughes Medical Institute, Boston, Massachusetts 02115, USA.
⁴ Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA.

^# Contributed equally.

PMID: 36517229
PMCID: PMC9808625
DOI: 10.1101/gr.276728.122

Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs

Nadin Rohland et al. Genome Res. 2022 Nov-Dec.

. 2022 Nov-Dec;32(11-12):2068-2078.

doi: 10.1101/gr.276728.122. Epub 2022 Dec 14.

Authors

Nadin Rohland^#^{1

2}, Swapan Mallick^#^{1

2

3}, Matthew Mah^{1

2

3}, Robert Maier^{1

2

4}, Nick Patterson^{2

4}, David Reich^{1

2

3

4}

Affiliations

¹ Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
² Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
³ Howard Hughes Medical Institute, Boston, Massachusetts 02115, USA.
⁴ Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA.

^# Contributed equally.

PMID: 36517229
PMCID: PMC9808625
DOI: 10.1101/gr.276728.122

Abstract

The strategy of in-solution enrichment for hundreds of thousands of single-nucleotide polymorphisms (SNPs) has been used to analyze >70% of individuals with genome-scale ancient DNA published to date. This approach makes it economical to study ancient samples with low proportions of human DNA and increases the rate of conversion of sampled remains into interpretable data. So far, nearly all such data have been generated using a set of bait sequences targeting about 1.24 million SNPs (the "1240k reagent"), but synthesis of the reagent has been cost-effective for only a few laboratories. In 2021, two companies, Daicel Arbor Biosciences and Twist Bioscience, made available assays that target the same core set of SNPs along with supplementary content. We test all three assays on a common set of 27 ancient DNA libraries and show that all three are effective at enriching many hundreds of thousands of SNPs. For all assays, one round of enrichment produces data that are as useful as two. In our testing, the "Twist Ancient DNA" assay produces the highest coverages, greatest uniformity on targeted positions, and almost no bias toward enriching one allele more than another relative to shotgun sequencing. We also identify hundreds of thousands of targeted SNPs for which there is minimal allelic bias when comparing 1240k data to either shotgun or Twist data. This facilitates coanalysis of the large data sets that have been generated using 1240k and Twist capture, as well as shotgun sequencing approaches.

PubMed Disclaimer

Figures

**Figure 1.**
Characterization of enrichment. (A) Degree of enrichment as a function of distance from 1,150,639 targeted autosomal SNPs (position 0) for the 15 high-coverage libraries at the bottom of Table 1; enrichment at the SNP relative to positions 100 bp away is shown in the legend. (B) Variation in coverage across SNP targets for the same libraries. (C) Proportion of nucleotides that are guanine or cytosine (GC) has a downward bias relative to the unenriched library for Arbor, upward for 1240k, and little bias for Twist Ancient DNA; this analysis uses data from the first 10 libraries in Table 1 with full results from both rounds of capture. (D) All assays preferentially enrich longer molecules, with the least length effect for Twist Ancient DNA (medians in legend, 10 libraries of data). All plots reflect data before removal of duplicated sequences as our goal is to study effectiveness of enrichment on a per-molecule basis.

**Figure 2.**
Performance of the three assays over a range of sequencing depths. For 10 libraries (five double-stranded [DS], and five single-stranded [SS] libraries) with varying percentages of human sequences before enrichment (0.1%–86.7%), we show the number of unique SNPs at different levels of sequencing depth (based on down-sampling). For a typical amount of sequencing of a capture experiment (25 million merged sequences), and after removal of duplicated sequences, the Twist Ancient DNA assay always enriches for more SNPs than the other two assays. For most experiments, more SNPs are retrieved after one round of enrichment than after two. We did not perform the two-enrichment-round Twist Ancient DNA experiment for the two libraries with the highest endogenous content (S1633.E1.L1 and S10871.E1.L6).

**Figure 3.**
Population genetic effects of enrichment and an effective filter for reducing bias. (A) Projection of data from 15 libraries in the last rows of Table 1 onto a PCA of modern West Eurasians (gray squares) shows nearly identical positions regardless of data source. (B) We compute symmetry statistics of the form f₄(library 1 − reagent 1, library 1 − reagent 2; library 2 − reagent 1, library 2 − reagent 2) and plot Z-scores for all 105 = 15 × 14/2 pairwise comparisons of the libraries (box-and-whisker plots show range, 25th and 75th percentiles, and mean). The statistics involving Arbor Complete are shown in green; remaining comparisons involving 1240k are shown in red; and the Twist–shotgun comparison is in blue. We show results both for all SNPs targets (*left*) and after applying the bias filter retaining a subset of 42% of autosomal SNPs (*right*). Results for this figure reflect data after removal of duplicated sequences.

**Figure 4.**
Variation in reference bias across SNPs. (A) All analyses are based on sequences from loci ascertained as highly likely to be heterozygous, corrected for stochastic error in the estimates using the expectation maximization (EM) algorithm described in Supplemental Text S2. (B) Mean and standard deviation of EM-corrected distributions stratified by sequence length (longer sequences align more reliably so have less bias). Results for this figure reflect data before removal of duplicated sequences.

See this image and copyright information in PMC

References

1. The 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature 526: 68–74. 10.1038/nature15393 - DOI - PMC - PubMed
1. Bergström A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, Chen Y, Felkel S, Hallast P, Kamm J, et al. 2020. Insights into human genetic variation and population history from 929 diverse genomes. Science 367: eaay5012. 10.1126/science.aay5012 - DOI - PMC - PubMed
1. Burbano HA, Hodges E, Green RE, Briggs AW, Krause J, Meyer M, Good JM, Maricic T, Johnson PL, Xuan Z, et al. 2010. Targeted investigation of the Neandertal genome by array-based sequence capture. Science 328: 723–725. 10.1126/science.1188046 - DOI - PMC - PubMed
1. Carpenter ML, Buenrostro JD, Valdiosera C, Schroeder H, Allentoft ME, Sikora M, Rasmussen M, Gravel S, Guillén S, Nekhrizov G, et al. 2013. Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93: 852–864. 10.1016/j.ajhg.2013.10.002 - DOI - PMC - PubMed
1. Castellano S, Parra G, Sánchez-Quinto FA, Racimo F, Kuhlwilm M, Kircher M, Sawyer S, Fu Q, Heinze A, Nickel B, et al. 2014. Patterns of coding variation in the complete exomes of three Neandertals. Proc Natl Acad Sci 111: 6666–6671. 10.1073/pnas.1405138111 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

HHMI/Howard Hughes Medical Institute/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs

Affiliations

Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources