Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun;416(14):3349-3360.
doi: 10.1007/s00216-024-05286-w. Epub 2024 Apr 12.

Python workflow for the selection and identification of marker peptides-proof-of-principle study with heated milk

Affiliations

Python workflow for the selection and identification of marker peptides-proof-of-principle study with heated milk

Gesine Kuhnen et al. Anal Bioanal Chem. 2024 Jun.

Abstract

The analysis of almost holistic food profiles has developed considerably over the last years. This has also led to larger amounts of data and the ability to obtain more information about health-beneficial and adverse constituents in food than ever before. Especially in the field of proteomics, software is used for evaluation, and these do not provide specific approaches for unique monitoring questions. An additional and more comprehensive way of evaluation can be done with the programming language Python. It offers broad possibilities by a large ecosystem for mass spectrometric data analysis, but needs to be tailored for specific sets of features, the research questions behind. It also offers the applicability of various machine-learning approaches. The aim of the present study was to develop an algorithm for selecting and identifying potential marker peptides from mass spectrometric data. The workflow is divided into three steps: (I) feature engineering, (II) chemometric data analysis, and (III) feature identification. The first step is the transformation of the mass spectrometric data into a structure, which enables the application of existing data analysis packages in Python. The second step is the data analysis for selecting single features. These features are further processed in the third step, which is the feature identification. The data used exemplarily in this proof-of-principle approach was from a study on the influence of a heat treatment on the milk proteome/peptidome.

Keywords: Chemometrics; Feature identification; Mass spectrometry; Processed milk; Proteomics; Python.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Visualization of the feature engineering process. Shown is the transformation of the mass spectrometric data (mzML-files) to two pandas.DataFrames. The data frame on the left (“data_df_sample”) contains the intensities of the features in each sample. The data frame on the right (“ft_metadata”) contains the metadata which defines each feature. The transformation of the data into this shape was proceeded to enable the application of data analysis tools. In the process of feature engineering smoothing of the data, feature finding, feature extraction, and data transformation were proceeded. NaN means “Not a Number” and is used when no data is available
Fig. 2
Fig. 2
PLS-DA plot of the milk samples. Orange spots (x-shaped) show the samples that were non-heated in the sample preparation. Blue spots are the samples which were heated in the sample preparation. The plot was generated with matplotlib and seaborn
Fig. 3
Fig. 3
Boxplot of the intensities of the five top-scoring features in the samples. For each feature, the intensities in the non-heated and the heated samples are presented as boxplots. The circles represent outliers. The plot was generated with matplotlib
Fig. 4
Fig. 4
2D plot of the selected features in the MS spectra. Spectra were filtered by a retention time and m/z window based on the values of the features. a The spectrum shows a sample without further heat treatment. b The spectrum shows a sample with heat treatment. Both samples UHT milk with 1.5% fat. Top-scoring features are marked with red circles. The plots were generated with matplotlib
Fig. 5
Fig. 5
Schematic workflow of the feature identification. The process is split into two parts. Firstly, potential matches were searched. Therefore, the m/z of a feature was compared to theoretic m/z of peptides resulting from tryptic digest of milk proteins, as well as fragments and modifications of these peptides. In the second step, the spectra of the features are searched for isotopes and fragments of the potential matches
Fig. 6
Fig. 6
Cut-out of the low-energy-spectrum from a heated milk sample. Shown are the isotopes of the feature FT43359 as blue signals. The red signals show the calculated isotopes for the peptide VLPVPQKAVPYPQR modified with lactulosyllysine. The spectrum has a retention time of 392.3320 s. The assigned peaks are listed in Table 4. The plot was generated with matplotlib
Fig. 7
Fig. 7
High-energy-spectrum of heated milk sample at the retention time of 392.5311 s. At the selected retention time, the feature FT43359 was detected in the low-energy-spectrum. The observed signals are displayed in blue. The theoretic b- and y-fragments that were associated by the approach are marked in orange (with the modification) and in green (with the modification). All fragments acquired at the retention time are shown in blue. The plot was generated with matplotlib

Similar articles

Cited by

References

    1. Parastar H, Tauler R. Big (bio) chemical data mining using chemometric methods : a need for chemists. Angew Chem. 2022;134:1–29. doi: 10.1002/ange.201801134. - DOI - PubMed
    1. Mannila H (1996) Data mining: machine learning, statistics, and databases. In: Proceedings - 8th International Conference on Scientific and Statistical Data Base Management, SSDBM 1996. IEEE, pp 2–8.
    1. Class L-C, Kuhnen G, Rohn S, Kuballa J. Diving deep into the data : a review of deep learning approaches and potential applications in foodomics. Foods. 2021;10:1–18. doi: 10.3390/foods10081803. - DOI - PMC - PubMed
    1. Hibbert DB. Vocabulary of concepts and terms in chemometrics (IUPAC Recommendations 2016) Pure Appl Chem. 2016;88:407–443. doi: 10.1515/pac-2015-0605. - DOI
    1. Hibbert DB, Minkkinen P, Faber NM, Wise BM. IUPAC project: a glossary of concepts and terms in chemometrics. Anal Chim Acta. 2009;642:3–5. doi: 10.1016/j.aca.2009.02.020. - DOI - PubMed

LinkOut - more resources