Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov 7;93(5):852-64.
doi: 10.1016/j.ajhg.2013.10.002. Epub 2013 Oct 25.

Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries

Affiliations

Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries

Meredith L Carpenter et al. Am J Hum Genet. .

Abstract

Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062-147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217-73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of the Whole-Genome In-Solution Capture Process To generate the RNA “bait” library, a human genomic library is created via adapters containing T7 RNA polymerase promoters (green boxes). This library is subjected to in vitro transcription via T7 RNA polymerase and biotin-16-UTP (stars), creating a biotinylated bait library. Meanwhile, the ancient DNA library (aDNA “pond”) is prepared via standard indexed Illumina adapters (purple boxes). These aDNA libraries often contain <1% endogenous DNA, with the remainder being environmental in origin. During hybridization, the bait and pond are combined in the presence of adaptor-blocking RNA oligos (blue zigzags), which are complimentary to the indexed Illumina adapters and thus prevent nonspecific hybridization between adapters in the aDNA library. After hybridization, the biotinylated bait and bound aDNA is pulled down with streptavidin-coated magnetic beads, and any unbound DNA is washed away. Finally, the DNA is eluted and amplified for sequencing.
Figure 2
Figure 2
Results of Increased Sequencing of Samples M4 and NA40 (A) Yield of unique fragments for M4 (Bronze Age hair) precapture (blue) and postcapture (red) libraries with increasing amounts of sequencing. The fold enrichment in number of unique reads with increasing amounts of sequencing is plotted in green, with values on the secondary y axis. (B) Yield of unique fragments for NA40 (Peruvian bone) precapture (blue) and postcapture (red) libraries with increasing amounts of sequencing. The fold enrichment in number of unique reads with increasing amounts of sequencing is plotted in green, with values on the secondary y axis. (C) Venn diagram showing the overlap between the NA40 pre- and postcapture libraries based on sequencing of 12.3 million reads. (D) Coverage plot of the M4 and NA40 libraries based on sequencing of 18.6 million and 12.3 million reads, respectively. Shown is a random 10-megabase segment of chromosome 1. Coverage was calculated in 1 kb windows across the region. (E) Insert size distribution for NA40 pre- and postcapture libraries. (F) Percent GC content of reads for NA40 pre- and postcapture libraries.
Figure 3
Figure 3
Principal Component Analysis of Pre- and Postcapture Samples Based on Sequencing One Million Reads Each Principal component analysis of SNPs overlapping between the 1000 Genomes reference panel and each ancient individual, with Native American individuals also included in (E) and (F). The principal components were calculated with the modern individuals only, and the ancient individual was then projected onto the plot. Shown are (A) V2 (Bulgarian tooth) precapture and (B) postcapture; (C) M4 (Bronze Age hair) precapture and (D) postcapture; and (E) NA40 (Peruvian bone) precapture and (F) postcapture. Population key: ASW, Americans of African ancestry in SW USA; AYM, Aymara from the Peruvian Andes; CEU, Utah residents (CEPH) with Northern and Western European ancestry; CHB, Han Chinese in Beijing, China; CHS, Southern Han Chinese; CLM, Colombians from Medellin, Columbia; FIN, Finnish in Finland; GBR, British in England and Scotland; IBS, Iberian population in Spain; JPT, Japanese in Tokyo, Japan; KAR, Karitiana from the Brazilian Amazon; LWK, Luhya in Webuye, Kenya; MAY, Mayan from Mexico; MXL, Mexican ancestry from Los Angeles, USA; PUR, Puerto Ricans from Puerto Rico; TSI, Toscani in Italy; YRI, Yoruba in Ibadan, Nigeria.

References

    1. Green R.E., Krause J., Briggs A.W., Maricic T., Stenzel U., Kircher M., Patterson N., Li H., Zhai W., Fritz M.H.-Y. A draft sequence of the Neandertal genome. Science. 2010;328:710–722. - PMC - PubMed
    1. Rasmussen M., Li Y., Lindgreen S., Pedersen J.S., Albrechtsen A., Moltke I., Metspalu M., Metspalu E., Kivisild T., Gupta R. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature. 2010;463:757–762. - PMC - PubMed
    1. Rasmussen M., Guo X., Wang Y., Lohmueller K.E., Rasmussen S., Albrechtsen A., Skotte L., Lindgreen S., Metspalu M., Jombart T. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science. 2011;334:94–98. - PMC - PubMed
    1. Keller A., Graefen A., Ball M., Matzas M., Boisguerin V., Maixner F., Leidinger P., Backes C., Khairat R., Forster M. New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. Nat Commun. 2012;3:698. - PubMed
    1. Sánchez-Quinto F., Schroeder H., Ramirez O., Avila-Arcos M.C., Pybus M., Olalde I., Velazquez A.M., Marcos M.E., Encinas J.M., Bertranpetit J. Genomic affinities of two 7,000-year-old Iberian hunter-gatherers. Curr. Biol. 2012;22:1494–1499. - PubMed

Publication types