Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 5;13(1):2458.
doi: 10.1038/s41467-022-30097-x.

SPIN enables high throughput species identification of archaeological bone by proteomics

Affiliations

SPIN enables high throughput species identification of archaeological bone by proteomics

Patrick Leopold Rüther et al. Nat Commun. .

Abstract

Species determination based on genetic evidence is an indispensable tool in archaeology, forensics, ecology, and food authentication. Most available analytical approaches involve compromises with regard to the number of detectable species, high cost due to low throughput, or a labor-intensive manual process. Here, we introduce "Species by Proteome INvestigation" (SPIN), a shotgun proteomics workflow for analyzing archaeological bone capable of querying over 150 mammalian species by liquid chromatography-tandem mass spectrometry (LC-MS/MS). Rapid peptide chromatography and data-independent acquisition (DIA) with throughput of 200 samples per day reduce expensive MS time, whereas streamlined sample preparation and automated data interpretation save labor costs. We confirm the successful classification of known reference bones, including domestic species and great apes, beyond the taxonomic resolution of the conventional peptide mass fingerprinting (PMF)-based Zooarchaeology by Mass Spectrometry (ZooMS) method. In a blinded study of degraded Iron-Age material from Scandinavia, SPIN produces reproducible results between replicates, which are consistent with morphological analysis. Finally, we demonstrate the high throughput capabilities of the method in a high-degradation context by analyzing more than two hundred Middle and Upper Palaeolithic bones from Southern European sites with late Neanderthal occupation. While this initial study is focused on modern and archaeological mammalian bone, SPIN will be open and expandable to other biological tissues and taxa.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. High-throughput bone proteome analysis workflow and benchmark.
a Sample preparation and data acquisition. Proteins were obtained from bone chips or powder by simultaneous demineralization and extraction. Cleanup by Protein Aggregation Capture and digestion can be automated by a magnetic bead-handling robot. Peptides were rapidly separated and analyzed by tandem mass spectrometry with a throughput of 100 samples per day (spd) in data-dependent or 200 spd in data-independent acquisition mode. b Performance of the “Species by Proteome INvestigation” (SPIN) sample preparation protocol executed manually (SPINmanual) or by robot (SPINrobot), compared to other common sample preparation techniques. Methods were compared by analyzing Pleistocene mammoth bone powder from the same batch in n = 3 technical replicate experiments with a 60 spd gradient and data-dependent acquisition. Bars indicate mean precursor identifications obtained with each method separated by enzymatic cleavage specificity shown by color with sand color for tryptic, dark blue for semi-tryptic with non-tryptic C-terminus and light blue for semi-tryptic with non-tryptic N-terminus. Error bars are centered at the total number of precursor identifications and indicate standard deviation. c Comparison of precursor identifications accumulated over retention time between the fast 100 spd DDA method analyzed by conventional database searching (sand color) and the rapid 200 spd DIA method analyzed with a library-based (dark blue) vs. library-free (light blue) approach. Peptides were generated by SPIN using a bovine bone and analyzed by LC-MS/MS with the two acquisition methods. d Gene-wise cumulative absolute amino acid coverage based on the precursors identified in c shown over the top 20 genes ranked by the number of precursors. Color indicates acquisition and peptide identification method with sand color for DDA, dark blue for library-based DIA, and light blue for library-free DirectDIA.
Fig. 2
Fig. 2. Data analysis pipeline for species identification.
a The aligned protein database, species difference matrix, and manually curated species marker peptides (top row) are used at multiple stages of the data processing pipeline (bottom row): Peptide identifications were converted to site-level, scored by joining intensity, score, and number of precursors (J-Score) and used to identify the winner of all possible species-to-species comparisons. Fine resolution of closely related species can be further improved by using manually selected species marker peptides. b The mammalian species database comprising 20 genes across 177 species (156 species with >14 genes) was generated by merging Uniprot and NCBI with manually curated and reannotated protein sequences. The phylogenetic tree was generated from the protein database using Fast Tree and FigTree. Color indicates database source with dark blue for sequences with annotated gene name from Uniprot, light blue for sequences from Uniprot with gene name added from UniRef, sand color for sequences from NCBI, pink for sequences available in both databases, and black for manually added sequences. c Example species competition matrix for the reference sample “Ovis_07” only showing the 13 reference species. White numbers indicate the summed joint scores (J-Score) of the species-discriminating sites. Gray cells indicate species pairs, where no species-discriminating sites have been identified in the sample. Pink indicates that the left species wins and blue indicates that the top species wins the comparison. The phylogenetic tree is a subset of the tree in panel b. The complete species competition matrices comprise all 156 target and 156 decoy species, i.e., 24,336 comparisons. d Absolute sequence coverage and relative protease intensity in reversed log10-scale for all samples from the three datasets in this study. The vertical site coverage cutoff is used to control the false-discovery rate at 1%. The horizontal protease intensity cutoff excludes samples with low signal (lower than 75% of the blank runs). Independent analysis of both parameters is displayed as histograms. Sample sets are indicated by color with black dots for blank runs, sand color for the Portuguese sample set, light blue for the samples from Denmark, and pink for the reference samples.
Fig. 3
Fig. 3. Reference species analysis.
a Species identification results based on DDA, directDIA, and library-based DIA analysis. Dark blue boxes indicate correct identification of a single or multiple indistinguishable (marked by asterisk) species. Light blue indicates species that could not be separated from their closest relatives. Blanks that were below the relative protease intensity threshold are shown in pink. The “identified sites” bar chart shows the absolute amino acid coverage in blue for sites matching the true species and pink for non-matching sites. The relative protease intensity was calculated by dividing the intensity of protease peptides by total intensity and plotted in log-scale. b Bovine species identifications obtained by library-based DIA analysis. Phylogeny was based on the protein database. Correctly identified single or indistinguishable species are highlighted in blue. Inconsistent identifications are marked in pink. Best Matching species are on the left and the refined “fine-grouping” on the right side. c Same display as in b for equine species analyzed by library-based DIA data. The additional plots on the right side show the log10 intensity of two species-discriminating peptides for the horse isoform on the x-axis and the donkey isoform on the y-axis. Missing quantifications are shown as zero log10 intensity. Donkey samples are marked in sand color, horse in dark blue, Przewalskii horse in light blue, and hybrids pink. d Species identifications after fine-grouping for great apes comparing the three different peptide identification strategies. Correctly identified genus is highlighted in blue. Broader matches within the family are marked in pink. Differences between the three identification strategies were only observed for the genus Pan.
Fig. 4
Fig. 4. Species identification of bones from the Scandinavian Iron-Age.
a Location of the archaeological site “Salpetermosen Syd 10” on Zealand in Denmark in the Hillerød municipality 30 km north of Copenhagen. Map drawn in Mapbox Studio using a custom style. b Cross section of an in situ wetland bone deposit. Scale bar is 50 cm. Four bones were radiocarbon dated between 1720 and 1570 BP. Picture provided by the Museum of North Zealand. c Species identification results by SPIN (5 min method, library-based DIA) and by morphological assessment for 63 samples from the Salpetermosen site measured in technical duplicates and 3 blanks. Rows represent individual samples and have been ordered first by morphological species assignment and then by decreasing mean site coverage. The upper left and lower right wedge of each cell represent results measured in two separate experiments, one with higher (upper left, dark blue) and the other with lower (lower right, light blue) MS signal intensity. The first seven columns indicate SPIN species by blue wedges and morphological species possibilities by pink boxes. Bovine species assignments are combined in column two. The eighth and ninth columns are heatmaps showing the absolute number of covered amino acids and relative protease intensity, respectively. d Summary of SPIN species identifications from panel c in the replicate with high MS intensity. Bovine identifications are separated into cow (Bos) and broader bovine identifications (Bos/Bison). Striped colors indicate samples with insufficient sequence coverage to distinguish closely related taxa. Samples with insufficient sequence coverage for confident species identification are marked as “signal too low” and correctly excluded blanks are marked in black. e Pseudo receiver operating characteristic (ROC) curves for comparing the sensitivity and success rate of three different data acquisition and analysis strategies. Results of each dataset were sorted by decreasing number of identified sites. The y-axis shows the cumulative number of correct species identifications in agreement with the morphology. The x-axis shows the cumulative number of false or missing identifications below the relative protease intensity threshold. Color indicates data acquisition and analysis mode with pink for DDA, dark blue for library-based DIA, and sand color for library-free DirectDIA. Experiments with lower MS intensity are shown by dashed and high intensity by solid lines.
Fig. 5
Fig. 5. Large-scale species identification at three sites with early human occupation on the Iberian peninsula.
a Locations of the three sites on a current map of Portugal. Map drawn in Mapbox Studio using a custom style. b Species identified in 84 samples from levels 6–7 (29–31,500 BP) of Vale Boi, in 95 samples from layers GG to JJ (38–45,000 BP) of Lapa do Picareiro, and 34 samples from chambers 1 and 2 (estimated 50–60,000 BP) of Gruta da Companheira. Overall species distribution is displayed by the pie chart, whereas bar charts show species ratios for separate compartments of the assemblage. Colors are used to distinguish species, as indicated in the legend. c Average fold-coverage of the 20 genes used for SPIN comparing the three Portuguese sites with the modern reference and iron-age material. Coverage was calculated by summing the number of precursors at each site in the global aligned database and is indicated by color using white for no coverage, blue for medium coverage, and pink for high coverage. The values represent the average fold-coverage in 10 amino acid bins for each dataset.
Fig. 6
Fig. 6. Comparison of SPIN and PMF.
a Alluvial diagram showing species identification of 46 reference bone samples and 3 laboratory blanks. Small bars on the x-axis indicate individual samples. Color and position in the middle column represent the true species, whereas the left and right column report the species identification by SPIN and PMF, respectively. Bars with color gradients indicate changing species assignments. b Alluvial diagram showing species identification of 20 representative samples from the Danish Salpetermosen site (Fig. 5). Left column indicates the species identification by SPIN, whereas the right column indicates the species identified by PMF. c Alluvial diagram showing species identification of 21 representative samples from the three Portuguese sites (Fig. 5). Left column indicates the species identification by SPIN, whereas the right column indicates the species identified by PMF.

References

    1. Orlando L, Gilbert MTP, Willerslev E. Reconstructing ancient genomes and epigenomes. Nat. Rev. Genet. 2015;16:395–408. doi: 10.1038/nrg3935. - DOI - PubMed
    1. Slon V, et al. The genome of the offspring of a Neanderthal mother and a Denisovan father. Nature. 2018;561:113–116. doi: 10.1038/s41586-018-0455-x. - DOI - PMC - PubMed
    1. Cappellini E, Collins MJ, Gilbert MTP. Biochemistry. Unlocking ancient protein palimpsests. Science. 2014;343:1320–1322. doi: 10.1126/science.1249274. - DOI - PubMed
    1. Demarchi B, et al. Protein sequences bound to mineral surfaces persist into deep time. elife. 2016;5:e17092. doi: 10.7554/eLife.17092. - DOI - PMC - PubMed
    1. Asensio L, González I, García T, Martín R. Determination of food authenticity by enzyme-linked immunosorbent assay (ELISA) Food Control. 2008;19:1–8. doi: 10.1016/j.foodcont.2007.02.010. - DOI

Publication types