Introducing SPeDE: High-Throughput Dereplication and Accurate Determination of Microbial Diversity from Matrix-Assisted Laser Desorption-Ionization Time of Flight Mass Spectrometry Data

Charles Dumolin^#¹, Maarten Aerts^#¹, Bart Verheyde¹, Simon Schellaert², Tim Vandamme¹, Felix Van der Jeugt², Evelien De Canck¹, Margo Cnockaert¹, Anneleen D Wieme^{1

3}, Ilse Cleenwerck^{1

3}, Jindrich Peiren^{1

3}, Peter Dawyndt², Peter Vandamme^{1

3}, Aurélien Carlier⁴

Affiliations

¹ Laboratory of Microbiology, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.
² Computational Biology Laboratory, Department of Applied Mathematics, Computer Science and Statistics, Faculty of Sciences, Ghent University, Ghent, Belgium.
³ BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.
⁴ Laboratory of Microbiology, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium aurelien.carlier@ugent.be.

^# Contributed equally.

PMID: 31506264
PMCID: PMC6739102
DOI: 10.1128/mSystems.00437-19

Introducing SPeDE: High-Throughput Dereplication and Accurate Determination of Microbial Diversity from Matrix-Assisted Laser Desorption-Ionization Time of Flight Mass Spectrometry Data

Charles Dumolin et al. mSystems. 2019.

. 2019 Sep 10;4(5):e00437-19.

doi: 10.1128/mSystems.00437-19.

Authors

Affiliations

¹ Laboratory of Microbiology, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.
² Computational Biology Laboratory, Department of Applied Mathematics, Computer Science and Statistics, Faculty of Sciences, Ghent University, Ghent, Belgium.
³ BCCM/LMG Bacteria Collection, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium.
⁴ Laboratory of Microbiology, Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium aurelien.carlier@ugent.be.

^# Contributed equally.

PMID: 31506264
PMCID: PMC6739102
DOI: 10.1128/mSystems.00437-19

Abstract

The isolation of microorganisms from microbial community samples often yields a large number of conspecific isolates. Increasing the diversity covered by an isolate collection entails the implementation of methods and protocols to minimize the number of redundant isolates. Matrix-assisted laser desorption-ionization time-of-flight (MALDI-TOF) mass spectrometry methods are ideally suited to this dereplication problem because of their low cost and high throughput. However, the available software tools are cumbersome and rely either on the prior development of reference databases or on global similarity analyses, which are inconvenient and offer low taxonomic resolution. We introduce SPeDE, a user-friendly spectral data analysis tool for the dereplication of MALDI-TOF mass spectra. Rather than relying on global similarity approaches to classify spectra, SPeDE determines the number of unique spectral features by a mix of global and local peak comparisons. This approach allows the identification of a set of nonredundant spectra linked to operational isolation units. We evaluated SPeDE on a data set of 5,228 spectra representing 167 bacterial strains belonging to 132 genera across six phyla and on a data set of 312 spectra of 78 strains measured before and after lyophilization and subculturing. SPeDE was able to dereplicate with high efficiency by identifying redundant spectra while retrieving reference spectra for all strains in a sample. SPeDE can identify distinguishing features between spectra, and its performance exceeds that of established methods in speed and precision. SPeDE is open source under the MIT license and is available from https://github.com/LM-UGent/SPeDEIMPORTANCE Estimation of the operational isolation units present in a MALDI-TOF mass spectral data set involves an essential dereplication step to identify redundant spectra in a rapid manner and without sacrificing biological resolution. We describe SPeDE, a new algorithm which facilitates culture-dependent clinical or environmental studies. SPeDE enables the rapid analysis and dereplication of isolates, a critical feature when long-term storage of cultures is limited or not feasible. We show that SPeDE can efficiently identify sets of similar spectra at the level of the species or strain, exceeding the taxonomic resolution of other methods. The high-throughput capacity, speed, and low cost of MALDI-TOF mass spectrometry and SPeDE dereplication over traditional gene marker-based sequencing approaches should facilitate adoption of the culturomics approach to bacterial isolation campaigns.

Keywords: MALDI-TOF MS; bioinformatics; dereplication; microbial ecology.

PubMed Disclaimer

Figures

**FIG 1**
Schematic representation of the SPeDE algorithm. See Materials and Methods for a detailed description of the algorithm. Briefly, all possible pairs of peak lists in a data set are compared (step 2.1). Peaks which are not shared by a pair of spectra are validated by estimating the Pearson product moment correlation (PPMC) between raw spectra in a local area surrounding the peak (step 2.2). Peaks with a PPMC below a threshold value are considered discriminating. The number of discriminating peaks or unique spectral features (USFs) between pairs of spectra is computed and tabulated (step 3). Pairs of spectra for which no USFs are found in at least one of the elements are matched and clustered into operational isolation units (OIUs; step 4). All spectra with a quality too poor to be considered for inclusion as a reference spectrum are matched to an OIU to give a reliable abundance estimate for each OIU. The output of SPeDE includes a table of representative spectra for each OIU and a USF distance matrix between all spectra which can be used to generate a dendrogram or a Krona plot.

**FIG 2**
Approximate maximum likelihood phylogenetic tree of strains included in the benchmark data set based on 40 single-copy, conserved marker protein genes. OTUs were defined as groups of strains with an intragroup pairwise genome-wide ANI of >98%. OTU clusters containing more than one strain are highlighted. The number of the references obtained per strain are indicated by green bars, and the number of strains linked to each reference are indicated by blue bars.

**FIG 3**
Accuracy of MALDI-TOF mass spectra matching by SPeDE. (A) Genomic similarity of the 7 strains of the benchmark data set for which no reference spectrum was retained to strains within the same OIU. Genomic similarities expressed by ANI values are shown. (B) Distribution of the lowest ANI values within 210 OIUs. Bins have a width of 0.36%. (C) Genomic similarity within OIUs composed of more than one strain. Each data point corresponds to the ANI value between a pair of strains contained within the OIU.

See this image and copyright information in PMC

References

1. Urbaniak C, Sielaff AC, Frey KG, Allen JE, Singh N, Jaing C, Wheeler K, Venkateswaran K. 2018. Detection of antimicrobial resistance genes associated with the International Space Station environmental surfaces. Sci Rep 8:814. doi: 10.1038/s41598-017-18506-4. - DOI - PMC - PubMed
1. Shreiner AB, Kao JY, Young VB. 2015. The gut microbiome in health and in disease. Curr Opin Gastroenterol 31:69–75. doi: 10.1097/MOG.0000000000000139. - DOI - PMC - PubMed
1. Busby PE, Soman C, Wagner MR, Friesen ML, Kremer J, Bennett A, Morsy M, Eisen JA, Leach JE, Dangl JL. 2017. Research priorities for harnessing plant microbiomes in sustainable agriculture. PLoS Biol 15:e2001793. doi: 10.1371/journal.pbio.2001793. - DOI - PMC - PubMed
1. Falony G, Joossens M, Vieira-Silva S, Wang J, Darzi Y, Faust K, Kurilshikov A, Bonder MJ, Valles-Colomer M, Vandeputte D, Tito RY, Chaffron S, Rymenans L, Verspecht C, De Sutter L, Lima-Mendez G, D’hoe K, Jonckheere K, Homola D, Garcia R, Tigchelaar EF, Eeckhaudt L, Fu J, Henckaerts L, Zhernakova A, Wijmenga C, Raes J. 2016. Population-level analysis of gut microbiome variation. Science 352:560–564. doi: 10.1126/science.aad3503. - DOI - PubMed
1. Lorenz P, Eck J. 2005. Metagenomics and industrial applications. Nat Rev Microbiol 3:510–516. doi: 10.1038/nrmicro1161. - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Introducing SPeDE: High-Throughput Dereplication and Accurate Determination of Microbial Diversity from Matrix-Assisted Laser Desorption-Ionization Time of Flight Mass Spectrometry Data

Affiliations

Introducing SPeDE: High-Throughput Dereplication and Accurate Determination of Microbial Diversity from Matrix-Assisted Laser Desorption-Ionization Time of Flight Mass Spectrometry Data

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources