Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec;19(12):2125-2139.
doi: 10.1074/mcp.TIR120.002061. Epub 2020 Sep 30.

Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS1) and in Silico Peptide Mass Libraries

Affiliations

Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS1) and in Silico Peptide Mass Libraries

Peter Lasch et al. Mol Cell Proteomics. 2020 Dec.

Abstract

Over the past decade, modern methods of MS (MS) have emerged that allow reliable, fast and cost-effective identification of pathogenic microorganisms. Although MALDI-TOF MS has already revolutionized the way microorganisms are identified, recent years have witnessed also substantial progress in the development of liquid chromatography (LC)-MS based proteomics for microbiological applications. For example, LC-tandem MS (LC-MS2) has been proposed for microbial characterization by means of multiple discriminative peptides that enable identification at the species, or sometimes at the strain level. However, such investigations can be laborious and time-consuming, especially if the experimental LC-MS2 data are tested against sequence databases covering a broad panel of different microbiological taxa. In this proof of concept study, we present an alternative bottom-up proteomics method for microbial identification. The proposed approach involves efficient extraction of proteins from cultivated microbial cells, digestion by trypsin and LC-MS measurements. Peptide masses are then extracted from MS1 data and systematically tested against an in silico library of all possible peptide mass data compiled in-house. The library has been computed from the UniProt Knowledgebase covering Swiss-Prot and TrEMBL databases and comprises more than 12,000 strain-specific in silico profiles, each containing tens of thousands of peptide mass entries. Identification analysis involves computation of score values derived from correlation coefficients between experimental and strain-specific in silico peptide mass profiles and compilation of score ranking lists. The taxonomic positions of the microbial samples are then determined by using the best-matching database entries. The suggested method is computationally efficient - less than 2 mins per sample - and has been successfully tested by a test set of 39 LC-MS1 peak lists obtained from 19 different microbial pathogens. The proposed method is rapid, simple and automatable and we foresee wide application potential for future microbiological applications.

Keywords: Bacteria; LC-MS1; bioinformatics software; diagnostic; diagnostics; identification of microorganisms; mass spectrometry; microbiology.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest—The authors have declared a conflict of interest. A.S., J.D., and P.L. are the inventors of SPEED and have submitted patent applications related to SPEED.

Figures

None
Graphical abstract
Fig. 1.
Fig. 1.
Overview of the proposed LC-MS1 based microbial identification workflow. Pure microbial cultures are prepared and colony material is processed using established sample preparation protocols for shotgun proteomics. Mass spectrometry data are then obtained using LC–MS. MS1 data are extracted and pre-processed for subsequent comparison against a library of in silico mass profiles obtained from UniProtKB/Swiss-Prot and UniProtKB/TrEMBL protein sequence data. This library is composed of MW pattern, or profiles, each representing a characteristic strain-specific combination of peptide masses whereby peptides may be specific or nonspecific in a MS2 context. A ranking list of correlation, or inter-spectral distance values (i.e. of scores) is established, which provides information on the taxonomic identity of the organism studied.
Fig. 2.
Fig. 2.
Schematic workflow for generating an in silico database from UniProtKB/Swiss-Prot and/or UniProtKB/TrEMBL protein sequence data. The Matlab toolbox parseuniprot represents a proteomic pipeline in which three main internal functions, readdat, resort and modfeat are consecutively executed. The function readdat converts the content from structured text files available from ftp://ftp.uniprot.org into Matlab structure arrays that contain the complete information required to compile the in silico databases. Such arrays are subsequently processed by the functions resort and modfeat; the output of the parseuniprot pipeline is a collection of strain-specific in silico peptide mass profiles suitable for computer-based comparison (pattern matching) with experimental LC-MS1 test spectra.
Fig. 3.
Fig. 3.
Pre-processing and feature selection of LC-MS1 data. MS1 peak data were acquired from a culture of Enterococcus faecalis DSM 20371; sample preparation has been carried according to the SPEED sample preparation protocol (27). Top row: histogram bar chart of log10 scaled MS1 peak intensities (left) and the molecular weight (MW) distribution (right) of peaks after feature detection by the Minora algorithm (=original* data, blue bars) and after pre-processing and feature selection by readlcmstxtfile (processed data, red bars). Total number of peaks in original/processed MS1 data: 82843/42559. Number of oxidized/deamidated peptides found and removed: 389/329. Lower row: ratio between the number of peaks present in processed and in original MS1 data as a function of peak intensity (log10 scaled, left), or of the MW (right). Pre-processing was carried out by readlcmstxtfile, a Matlab function developed in house. This function has been designed to preferentially remove low intensity signals in the low MW region (< 2000 Da). The blue shaded area between 2000 – 5500 Da indicates the MW range used for correlation analysis by MicrobeMS.
Fig. 4.
Fig. 4.
Data analysis workflow for microbial identification based on experimental LC-MS1 data and in silico databases comprising strain-specific peptide mass profiles derived from microbial genomes.

Similar articles

Cited by

References

    1. Seng P., Drancourt M., Gouriet F., La Scola B., Fournier P. E., Rolain J. M., and Raoult D. (2009) Ongoing revolution in bacteriology: routine identification of bacteria by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Clin. Infect. DIS. 49, 543–551 - PubMed
    1. Nomura F. (2015) Proteome-based bacterial identification using matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS): A revolutionary shift in clinical diagnostic microbiology. Biochim. Biophys. Acta 1854, 528–537 - PubMed
    1. Schubert S., and Kostrzewa M. (2017) MALDI-TOF MS in the Microbiology Laboratory: Current Trends. Curr. Issues Mol. Biol. 23, 17–20 - PubMed
    1. Welker M., Van Belkum A., Girard V., Charrier J. P., and Pincus D. (2019) An update on the routine application of MALDI-TOF MS in clinical microbiology. Expert Rev. Proteomics 16, 695–710 - PubMed
    1. Sandrin T. R., Goldstein J. E., and Schumaker S. (2013) MALDI TOF MS profiling of bacteria at the strain level: a review. Mass Spectrom. Rev. 32, 188–217 - PubMed

Substances

LinkOut - more resources