Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Aug 31:18:321-356.
doi: 10.1146/annurev-genom-091416-035526. Epub 2017 Apr 26.

A Robust Framework for Microbial Archaeology

Affiliations
Review

A Robust Framework for Microbial Archaeology

Christina Warinner et al. Annu Rev Genomics Hum Genet. .

Abstract

Microbial archaeology is flourishing in the era of high-throughput sequencing, revealing the agents behind devastating historical plagues, identifying the cryptic movements of pathogens in prehistory, and reconstructing the ancestral microbiota of humans. Here, we introduce the fundamental concepts and theoretical framework of the discipline, then discuss applied methodologies for pathogen identification and microbiome characterization from archaeological samples. We give special attention to the process of identifying, validating, and authenticating ancient microbes using high-throughput DNA sequencing data. Finally, we outline standards and precautions to guide future research in the field.

Keywords: ancient DNA; bacteria; high-throughput sequencing; metagenomics; microbiology; microbiome; pathogens.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Evolutionary relationships of taxa within the bacterial families (a) Enterobacteriaceae and (b) Porphyromonadaceae based on full-length 16S rRNA gene sequences. Taxonomy and phylogeny are incongruent for the gut-associated genera Klebsiella, Salmonella, Escherichia, and Shigella, which are not monophyletic, but rather exhibit polyphyletic and paraphyletic clade structure. By contrast, taxonomy and phylogeny are congruent for the oral-associated genera Tannerella and Porphyromonas, which form distinct monophyletic clades with high bootstrap support (38). Trees are shown relative to outgroup taxa within the same bacterial family. Note the difference in scales between the two trees. Supplemental Appendix 1 provides the specific parameters used in tree construction.
Figure 2
Figure 2
Reconstructed taxonomic profile for archaeological dental calculus using the QIIME, metaBIT, MIDAS, and MALT pipelines. Analysis was performed on 1,967,941 shotgun metagenomic DNA sequences obtained from a previously described dental calculus sample from the Spanish Chalcolithic site of Camino del Molino (approximately 2340–2920 BCE) (207). Although all four pipelines generally identify the same phyla, predictable biases are also readily apparent. For example, QIIME estimates the largest proportion to be Firmicutes, a phylum known to have a high rrn operon copy number (190). Euryarchaeota is absent in the MIDAS analysis because there are no reference genome sequences for this phylum in the database. MIDAS and metaBIT, which rely on genome-scale databases, also fail to detect the largely uncultivated phyla Saccharibacteria (TM7), Chloroflexi, and SR1. Explanations for other differences in phylum frequency abundance, such as the high proportion of Actinobacteria estimated by MALT and the absence of Fusobacterium detected by MIDAS, are not as clear. Supplemental Appendix 1 provides the specific parameters used for each analysis. Abbreviations: MALT, MEGAN Alignment Tool; MIDAS, Metagenomic Intra-Species Diversity Analysis System; QIIME, Quantitative Insights into Microbial Ecology.
Figure 3
Figure 3
Pathogens and their close environmental relatives. Many obligate pathogens share close 16S rRNA gene sequences with environmental microbes. Bacillus anthracis, Bordetella pertussis, Clostridium botulinum, and Mycobacterium tuberculosis have close relatives in soil, sewage, and extreme environments, whereas Salmonella enterica, Shigella dysenteriae, and Vibrio cholerae have close relatives in vertebrate gut and feces. V. cholerae relatives are also abundant in water sources, and B. anthracis and B. pertussis share close relatives found in association with nematodes and arthropods. By contrast, few environmental relatives outside of the Yersinia pseudotuberculosis complex were observed for Yersinia pestis; however, Y. pestis shows strong similarity to 16S rRNA sequences obtained from a study of global diversity in human saliva, indicating that human saliva in some parts of the world may harbor a previously undetected Yersinia relative. The total number of RDP database matches for targets other than the respective pathogen is shown in parentheses. Of all the obligate pathogens investigated, B. anthracis and S. dysenteriae had the highest number of hits to environmental sources. A subset of sources are highlighted (along with the number of RDP database matches for these targets, shown in parentheses) to illustrate the diversity of environments from which close matches were observed. Supplemental Appendix 3 provides a detailed list of taxa and sources. Abbreviation: RDP, Ribosomal Database Project.
Figure 4
Figure 4
Lack of signal for highly abundant endogenous bacteria resulting from database bias. A medieval dental calculus sample (G12) (196) was screened using MALT (71) and visualized in MEGAN6 (80) using two databases, one without the genome sequence of the oral bacterium Pseudopropionibacterium propionicum (database 1, red) and another with that genome sequence included (database 2, blue). The tree visualizes the results of both analyses, and nodes are scaled based on the summed number of hits to a log scale. The inclusion of P. propionicum results in hits being shifted away from related dietary (Propionibacterium freudenreichii ), skin (Propionibacterium acnes), and other species, as well previously nonaligned hits, toward the oral bacterial genome, revealing the presence of a previously unseen, highly abundant species. This highly abundant species, with more than 1 million assigned reads, would not have been detected in metagenomic screening methods before 2012, when the genome sequence was published. Supplemental Appendix 1 provides additional details. Abbreviations: MALT, MEGAN Alignment Tool; MEGAN, Metagenome Analyzer.
Figure 5
Figure 5
Schematic overview of different measures for the validation of species assignments in metagenomic data analysis. (a, b) Evenness of coverage. Correctly assigned reads are expected to distribute randomly across the reference (panel a); accumulation of reads in regions of high sequence conservation indicates misassigned reads originating from different closely related species (panel b). (c–e) Percent identity distributions. In panel c, most reads show a high similarity to the reference, which indicates a correct assignment. In panel d, most reads are highly dissimilar to the reference, which suggests that they originate from different related species. In some cases, as in panel e, a mixture of correctly assigned and misassigned reads can be observed. (f–i ) Haploidy. Because bacteria are haploid organisms, only one allele is expected for each genomic position. Only a small number of multiallelic sites are expected, which can result from a few misassigned or incorrectly aligned reads (panel f). A large number of multiallelic sites indicates that the assigned reads originate from more than one species or strain, which can result in symmetric allele frequency distributions (e.g., if two species or strains are present in equal abundance) (panel g) or asymmetric distributions (e.g., if two species or strains are present in unequal abundance) (panel h). A large number of misassigned reads from closely related species can result in a large number of multiallelic sites with low frequencies of the derived allele (panel i). Supplemental Figure 2 provides examples with empirical data from microbial archaeology studies.
Figure 6
Figure 6
DNA fragmentation from pairs of organisms retrieved from the same historic and prehistoric samples. The x and y axes show the fragmentation constant lambda (λ), which describes the fraction of broken bonds in the DNA backbone. The value of this constant can be assessed directly from the length distribution of high-throughput sequencing reads. The dashed red line indicates the linear regression. (a) DNA fragmentation of host and pathogen in DNA fragments retrieved from Solanum tuberosum herbarium samples infected with Phytophthora infestans (205). DNA from both organisms is fragmented in a correlated way and at a similar magnitude. (b) DNA fragmentation of host and pathogen in DNA fragments retrieved from Homo sapiens teeth samples infected with Yersinia pestis (158). DNA from both organisms is fragmented in a correlated way, but the Y. pestis DNA shows a higher magnitude of fragmentation than the H. sapiens DNA. (c) Fragmentation of oral archaeal (Methanobrevibacter sp.) and bacterial (Streptococcus sp.) DNA retrieved from Chalcolithic-era (approximately 2340–2920 BCE) human dental calculus (207). DNA from both organisms fragments in a correlated way but at different magnitudes: The Methanobrevibacter sp. DNA is less fragmented than the Streptococcus sp. DNA, which may be related to the more robust ether linkages in archaeal cell membranes and the protective action of histones in archaeal genomes. (d) DNA fragmentation of Methanobrevibacter sp. and H. sapiens retrieved from dental calculus (the same samples as in panel c). The fragmentation of H. sapiens DNA is not correlated with either that of Methanobrevibacter sp. (data shown here) or that of Streptococcus sp. (data not shown here). Human DNA often exhibits a higher magnitude of fragmentation in dental calculus compared with microbial DNA, a pattern consistent with an inflammation-driven entry of acellular human DNA into dental plaque biofilms that are rich in extracellular nucleases (139).
Figure 7
Figure 7
Bayesian source estimation of the microbial composition of dental calculus and laboratory samples. Using SourceTracker (91) with a panel of modern dental plaque, skin, and soil reference sources indicates that the modern dental calculus (GU30C) contains a large proportion of microbial DNA (>60%) originating from human dental plaque. Likewise, >70% of microbial DNA in an ancient calculus sample from a high-altitude tomb in Nepal (37.UM2010.9, dating from approximately 400–650 CE) originates from dental plaque. By contrast, microbial DNA from nineteenth-century dental calculus from the West African island of St. Helena (STH16) derives nearly entirely from soil (80%) and human skin (3%). Microbial DNA collected from the surfaces of an osteological laboratory originates from human skin and unknown sources. Consistent with these results, the modern and Nepalese dental calculus samples are dominated by phylotypes belonging to known oral-associated genera (77–80%); by contrast, only 2% of phylotypes in the St. Helena calculus are consistent with oral genera, and 64% are classified as environmental Mycobacteria spp. Data are from Reference 207.

References

    1. Achtman M. Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu. Rev. Microbiol. 2008;66 62:53–70. - PubMed
    1. Achtman M, Wagner M. Microbial diversity and the genetic nature of microbial species. Nat. Rev. Microbiol. 2008;6:431–40. - PubMed
    1. Acinas SG, Marcelino LA, Klepac-Ceraj V, Polz MF. Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J. Bacteriol. 2004;186:2629–35. - PMC - PubMed
    1. Allentoft ME, Collins M, Harker D, Haile J, Oskam CL, et al. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. Biol. Sci. 2012;279:4724–33. - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–10. - PubMed

Publication types

LinkOut - more resources