. 2022 May 5;13(1):2458.

doi: 10.1038/s41467-022-30097-x.

SPIN enables high throughput species identification of archaeological bone by proteomics

Patrick Leopold Rüther¹, Immanuel Mirnes Husic², Pernille Bangsgaard³, Kristian Murphy Gregersen⁴, Pernille Pantmann⁵, Milena Carvalho^{6

7}, Ricardo Miguel Godinho⁶, Lukas Friedl^{6

8}, João Cascalheira⁶, Alberto John Taurozzi³, Marie Louise Schjellerup Jørkov⁹, Michael M Benedetti^{6

10}, Jonathan Haws^{6

11}, Nuno Bicho⁶, Frido Welker³, Enrico Cappellini³, Jesper Velgaard Olsen¹²

Affiliations

¹ Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark. patrick.ruether@palaeome.org.
² Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.
³ Globe institute, University of Copenhagen, Copenhagen, Denmark.
⁴ Institute of Conservation, Royal Danish Academy, Copenhagen, Denmark.
⁵ Dept. of Archaeology, Museum Nordsjælland, Copenhagen, Denmark.
⁶ Interdisciplinary Center of Archaeology and Evolution of Human Behavior, University of Algarve, Faro, Portugal.
⁷ Departmrent of Anthropology, University of New Mexico, Albuquerque, NM, USA.
⁸ Dept. of Anthropology University of West Bohemia, Pilsen, Czech Republic.
⁹ The Laboratory of Biological Anthropology, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark.
¹⁰ Department of Earth and Ocean Sciences, University of North Carolina Wilmington, Wilmington, NC, USA.
¹¹ Department of Anthropology, University of Louisville, Louisville, KY, USA.
¹² Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark. jesper.olsen@cpr.ku.dk.

PMID: 35513387
PMCID: PMC9072323
DOI: 10.1038/s41467-022-30097-x

SPIN enables high throughput species identification of archaeological bone by proteomics

Patrick Leopold Rüther et al. Nat Commun. 2022.

. 2022 May 5;13(1):2458.

doi: 10.1038/s41467-022-30097-x.

Authors

Affiliations

¹ Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark. patrick.ruether@palaeome.org.
² Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.
³ Globe institute, University of Copenhagen, Copenhagen, Denmark.
⁴ Institute of Conservation, Royal Danish Academy, Copenhagen, Denmark.
⁵ Dept. of Archaeology, Museum Nordsjælland, Copenhagen, Denmark.
⁶ Interdisciplinary Center of Archaeology and Evolution of Human Behavior, University of Algarve, Faro, Portugal.
⁷ Departmrent of Anthropology, University of New Mexico, Albuquerque, NM, USA.
⁸ Dept. of Anthropology University of West Bohemia, Pilsen, Czech Republic.
⁹ The Laboratory of Biological Anthropology, Department of Forensic Medicine, University of Copenhagen, Copenhagen, Denmark.
¹⁰ Department of Earth and Ocean Sciences, University of North Carolina Wilmington, Wilmington, NC, USA.
¹¹ Department of Anthropology, University of Louisville, Louisville, KY, USA.
¹² Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark. jesper.olsen@cpr.ku.dk.

PMID: 35513387
PMCID: PMC9072323
DOI: 10.1038/s41467-022-30097-x

Abstract

Species determination based on genetic evidence is an indispensable tool in archaeology, forensics, ecology, and food authentication. Most available analytical approaches involve compromises with regard to the number of detectable species, high cost due to low throughput, or a labor-intensive manual process. Here, we introduce "Species by Proteome INvestigation" (SPIN), a shotgun proteomics workflow for analyzing archaeological bone capable of querying over 150 mammalian species by liquid chromatography-tandem mass spectrometry (LC-MS/MS). Rapid peptide chromatography and data-independent acquisition (DIA) with throughput of 200 samples per day reduce expensive MS time, whereas streamlined sample preparation and automated data interpretation save labor costs. We confirm the successful classification of known reference bones, including domestic species and great apes, beyond the taxonomic resolution of the conventional peptide mass fingerprinting (PMF)-based Zooarchaeology by Mass Spectrometry (ZooMS) method. In a blinded study of degraded Iron-Age material from Scandinavia, SPIN produces reproducible results between replicates, which are consistent with morphological analysis. Finally, we demonstrate the high throughput capabilities of the method in a high-degradation context by analyzing more than two hundred Middle and Upper Palaeolithic bones from Southern European sites with late Neanderthal occupation. While this initial study is focused on modern and archaeological mammalian bone, SPIN will be open and expandable to other biological tissues and taxa.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. High-throughput bone proteome analysis workflow and benchmark.**
a Sample preparation and data acquisition. Proteins were obtained from bone chips or powder by simultaneous demineralization and extraction. Cleanup by Protein Aggregation Capture and digestion can be automated by a magnetic bead-handling robot. Peptides were rapidly separated and analyzed by tandem mass spectrometry with a throughput of 100 samples per day (spd) in data-dependent or 200 spd in data-independent acquisition mode. b Performance of the “Species by Proteome INvestigation” (SPIN) sample preparation protocol executed manually (SPIN_manual) or by robot (SPIN_robot), compared to other common sample preparation techniques. Methods were compared by analyzing Pleistocene mammoth bone powder from the same batch in n = 3 technical replicate experiments with a 60 spd gradient and data-dependent acquisition. Bars indicate mean precursor identifications obtained with each method separated by enzymatic cleavage specificity shown by color with sand color for tryptic, dark blue for semi-tryptic with non-tryptic C-terminus and light blue for semi-tryptic with non-tryptic N-terminus. Error bars are centered at the total number of precursor identifications and indicate standard deviation. c Comparison of precursor identifications accumulated over retention time between the fast 100 spd DDA method analyzed by conventional database searching (sand color) and the rapid 200 spd DIA method analyzed with a library-based (dark blue) vs. library-free (light blue) approach. Peptides were generated by SPIN using a bovine bone and analyzed by LC-MS/MS with the two acquisition methods. d Gene-wise cumulative absolute amino acid coverage based on the precursors identified in c shown over the top 20 genes ranked by the number of precursors. Color indicates acquisition and peptide identification method with sand color for DDA, dark blue for library-based DIA, and light blue for library-free DirectDIA.

**Fig. 2. Data analysis pipeline for species identification.**
a The aligned protein database, species difference matrix, and manually curated species marker peptides (top row) are used at multiple stages of the data processing pipeline (bottom row): Peptide identifications were converted to site-level, scored by joining intensity, score, and number of precursors (J-Score) and used to identify the winner of all possible species-to-species comparisons. Fine resolution of closely related species can be further improved by using manually selected species marker peptides. b The mammalian species database comprising 20 genes across 177 species (156 species with >14 genes) was generated by merging Uniprot and NCBI with manually curated and reannotated protein sequences. The phylogenetic tree was generated from the protein database using Fast Tree and FigTree. Color indicates database source with dark blue for sequences with annotated gene name from Uniprot, light blue for sequences from Uniprot with gene name added from UniRef, sand color for sequences from NCBI, pink for sequences available in both databases, and black for manually added sequences. c Example species competition matrix for the reference sample “Ovis_07” only showing the 13 reference species. White numbers indicate the summed joint scores (J-Score) of the species-discriminating sites. Gray cells indicate species pairs, where no species-discriminating sites have been identified in the sample. Pink indicates that the left species wins and blue indicates that the top species wins the comparison. The phylogenetic tree is a subset of the tree in panel b. The complete species competition matrices comprise all 156 target and 156 decoy species, i.e., 24,336 comparisons. d Absolute sequence coverage and relative protease intensity in reversed log10-scale for all samples from the three datasets in this study. The vertical site coverage cutoff is used to control the false-discovery rate at 1%. The horizontal protease intensity cutoff excludes samples with low signal (lower than 75% of the blank runs). Independent analysis of both parameters is displayed as histograms. Sample sets are indicated by color with black dots for blank runs, sand color for the Portuguese sample set, light blue for the samples from Denmark, and pink for the reference samples.

**Fig. 3. Reference species analysis.**
a Species identification results based on DDA, directDIA, and library-based DIA analysis. Dark blue boxes indicate correct identification of a single or multiple indistinguishable (marked by asterisk) species. Light blue indicates species that could not be separated from their closest relatives. Blanks that were below the relative protease intensity threshold are shown in pink. The “identified sites” bar chart shows the absolute amino acid coverage in blue for sites matching the true species and pink for non-matching sites. The relative protease intensity was calculated by dividing the intensity of protease peptides by total intensity and plotted in log-scale. b Bovine species identifications obtained by library-based DIA analysis. Phylogeny was based on the protein database. Correctly identified single or indistinguishable species are highlighted in blue. Inconsistent identifications are marked in pink. Best Matching species are on the left and the refined “fine-grouping” on the right side. c Same display as in b for equine species analyzed by library-based DIA data. The additional plots on the right side show the log10 intensity of two species-discriminating peptides for the horse isoform on the x-axis and the donkey isoform on the y-axis. Missing quantifications are shown as zero log10 intensity. Donkey samples are marked in sand color, horse in dark blue, Przewalskii horse in light blue, and hybrids pink. d Species identifications after fine-grouping for great apes comparing the three different peptide identification strategies. Correctly identified *genus* is highlighted in blue. Broader matches within the family are marked in pink. Differences between the three identification strategies were only observed for the genus *Pan*.

**Fig. 4. Species identification of bones from the Scandinavian Iron-Age.**
a Location of the archaeological site “Salpetermosen Syd 10” on Zealand in Denmark in the Hillerød municipality 30 km north of Copenhagen. Map drawn in Mapbox Studio using a custom style. b Cross section of an in situ wetland bone deposit. Scale bar is 50 cm. Four bones were radiocarbon dated between 1720 and 1570 BP. Picture provided by the Museum of North Zealand. c Species identification results by SPIN (5 min method, library-based DIA) and by morphological assessment for 63 samples from the Salpetermosen site measured in technical duplicates and 3 blanks. Rows represent individual samples and have been ordered first by morphological species assignment and then by decreasing mean site coverage. The upper left and lower right wedge of each cell represent results measured in two separate experiments, one with higher (upper left, dark blue) and the other with lower (lower right, light blue) MS signal intensity. The first seven columns indicate SPIN species by blue wedges and morphological species possibilities by pink boxes. Bovine species assignments are combined in column two. The eighth and ninth columns are heatmaps showing the absolute number of covered amino acids and relative protease intensity, respectively. d Summary of SPIN species identifications from panel c in the replicate with high MS intensity. Bovine identifications are separated into cow (*Bos*) and broader bovine identifications (*Bos/Bison*). Striped colors indicate samples with insufficient sequence coverage to distinguish closely related taxa. Samples with insufficient sequence coverage for confident species identification are marked as “signal too low” and correctly excluded blanks are marked in black. e Pseudo receiver operating characteristic (ROC) curves for comparing the sensitivity and success rate of three different data acquisition and analysis strategies. Results of each dataset were sorted by decreasing number of identified sites. The y-axis shows the cumulative number of correct species identifications in agreement with the morphology. The x-axis shows the cumulative number of false or missing identifications below the relative protease intensity threshold. Color indicates data acquisition and analysis mode with pink for DDA, dark blue for library-based DIA, and sand color for library-free DirectDIA. Experiments with lower MS intensity are shown by dashed and high intensity by solid lines.

**Fig. 5. Large-scale species identification at three sites with early human occupation on the Iberian peninsula.**
a Locations of the three sites on a current map of Portugal. Map drawn in Mapbox Studio using a custom style. b Species identified in 84 samples from levels 6–7 (29–31,500 BP) of Vale Boi, in 95 samples from layers GG to JJ (38–45,000 BP) of Lapa do Picareiro, and 34 samples from chambers 1 and 2 (estimated 50–60,000 BP) of Gruta da Companheira. Overall species distribution is displayed by the pie chart, whereas bar charts show species ratios for separate compartments of the assemblage. Colors are used to distinguish species, as indicated in the legend. c Average fold-coverage of the 20 genes used for SPIN comparing the three Portuguese sites with the modern reference and iron-age material. Coverage was calculated by summing the number of precursors at each site in the global aligned database and is indicated by color using white for no coverage, blue for medium coverage, and pink for high coverage. The values represent the average fold-coverage in 10 amino acid bins for each dataset.

**Fig. 6. Comparison of SPIN and PMF.**
a Alluvial diagram showing species identification of 46 reference bone samples and 3 laboratory blanks. Small bars on the x-axis indicate individual samples. Color and position in the middle column represent the true species, whereas the left and right column report the species identification by SPIN and PMF, respectively. Bars with color gradients indicate changing species assignments. b Alluvial diagram showing species identification of 20 representative samples from the Danish Salpetermosen site (Fig. 5). Left column indicates the species identification by SPIN, whereas the right column indicates the species identified by PMF. c Alluvial diagram showing species identification of 21 representative samples from the three Portuguese sites (Fig. 5). Left column indicates the species identification by SPIN, whereas the right column indicates the species identified by PMF.

See this image and copyright information in PMC

References

1. Orlando L, Gilbert MTP, Willerslev E. Reconstructing ancient genomes and epigenomes. Nat. Rev. Genet. 2015;16:395–408. doi: 10.1038/nrg3935. - DOI - PubMed
1. Slon V, et al. The genome of the offspring of a Neanderthal mother and a Denisovan father. Nature. 2018;561:113–116. doi: 10.1038/s41586-018-0455-x. - DOI - PMC - PubMed
1. Cappellini E, Collins MJ, Gilbert MTP. Biochemistry. Unlocking ancient protein palimpsests. Science. 2014;343:1320–1322. doi: 10.1126/science.1249274. - DOI - PubMed
1. Demarchi B, et al. Protein sequences bound to mineral surfaces persist into deep time. elife. 2016;5:e17092. doi: 10.7554/eLife.17092. - DOI - PMC - PubMed
1. Asensio L, González I, García T, Martín R. Determination of food authenticity by enzyme-linked immunosorbent assay (ELISA) Food Control. 2008;19:1–8. doi: 10.1016/j.foodcont.2007.02.010. - DOI

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SPIN enables high throughput species identification of archaeological bone by proteomics

Affiliations

SPIN enables high throughput species identification of archaeological bone by proteomics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Molecular Biology Databases