Deep vs. Shallow Learning-based Filters of MSMS Spectra in Support of Protein Search Engines

Majdi Maabreh¹, Basheer Qolomany¹, James Springstead², Izzat Alsmadi³, Ajay Gupta¹

Affiliations

¹ Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA.
² Department of Chemical and Paper Engineering, Western Michigan University, Kalamazoo, MI, USA.
³ Department of Computing and Cyber Security, Texas A&M University, San Antonio, TX, USA.

PMID: 34408917
PMCID: PMC8370709
DOI: 10.1109/BIBM.2017.8217824

Deep vs. Shallow Learning-based Filters of MSMS Spectra in Support of Protein Search Engines

Majdi Maabreh et al. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov.

. 2017 Nov:2017:1175-1182.

doi: 10.1109/BIBM.2017.8217824. Epub 2017 Dec 18.

Authors

Majdi Maabreh¹, Basheer Qolomany¹, James Springstead², Izzat Alsmadi³, Ajay Gupta¹

Affiliations

¹ Department of Computer Science, Western Michigan University, Kalamazoo, MI, USA.
² Department of Chemical and Paper Engineering, Western Michigan University, Kalamazoo, MI, USA.
³ Department of Computing and Cyber Security, Texas A&M University, San Antonio, TX, USA.

PMID: 34408917
PMCID: PMC8370709
DOI: 10.1109/BIBM.2017.8217824

Abstract

Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MSMS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MSMS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MSMS spectra. We also show that our simple features list is significant where other shallow learning algorithms showed encouraging results in filtering the MSMS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.

Keywords: Big Data; Deep Learning; MSMS Filters; Machine Learning; Protein Search Engine; Searching Space Optimization; Shallow Learning.

PubMed Disclaimer

Figures

**Figure 1.**
The performance of machine learning algorithms using Human01-pFind dataset

**Figure 2.**
The performance of machine learning algorithms using Human01-Comet dataset

**Figure 3.**
The performance of machine learning algorithms using Human06-pFind dataset

**Figure 4.**
The performance of machine learning algorithms using Human06-Comet dataset

**Figure 5.**
The performance of machine learning algorithms using Human05-Comet dataset.

**Figure 6.**
The performance of machine learning algorithms using Mouse-pFind dataset.

**Figure 7.**
The performance of machine learning algorithms using Mouse-Comet dataset

**Figure 8.**
The performance of machine learning algorithms using Soybean-pFind dataset

**Figure 9.**
The performance of machine learning algorithms using Soybean-Comet dataset

**Figure 10.**
The performance of machine learning algorithms using Rat-Comet dataset

**Figure 11.**
Average performance of various machine learning algorithms across all datasets in our experiment

See this image and copyright information in PMC

References

1. Maabreh M, Gupta A, and Saeed F, A Parallel Peptide Indexer and Decoy Generator for Crux Tide using OpenMP. International Conference on High Performance Computing and Simulation (HPCS 2016), Innsbruck, Austria, July 2016.
1. Neuhauser N, Nagaraj N,McHardy P, Zanivan S, Scheltema R, Cox J and Mann M High performance computational analysis of large-scale proteome datasets to assess incremental contribution to coverage of the human genome. J. Proteome Res, Vol. 12, 2013. pp. 2858–2868. - PubMed
1. Ma B, Challenges in computational analysis of mass spectrometry data for proteomics. J. of computer science and technology Vol. 25, 2010. pp. 107–123.
1. Xu M, Geer LY, Bryant SH, Roth JS et al. , Assessing data quality of peptide mass spectra obtained by quadrupole ion trap mass spectrometry. J. Proteome Res 2005. Vol.4, pp. 300–305. - PubMed
1. Duda RO, Hart PE, Stork DG, Pattern Classification. Wiley-Interscience, New York: 2000.

Grants and funding

R15 GM120820/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep vs. Shallow Learning-based Filters of MSMS Spectra in Support of Protein Search Engines

Affiliations

Deep vs. Shallow Learning-based Filters of MSMS Spectra in Support of Protein Search Engines

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources