An efficient algorithm for the blocked pattern matching problem
- PMID: 25322837
- DOI: 10.1093/bioinformatics/btu678
An efficient algorithm for the blocked pattern matching problem
Abstract
Motivation: Tandem mass spectrometry (MS) has become the method of choice for protein identification and quantification. In the era of big data biology, tandem mass spectra are often searched against huge protein databases generated from genomes or RNA-Seq data for peptide identification. However, most existing tools for MS-based peptide identification compare a tandem mass spectrum against all peptides in a database whose molecular masses are similar to the precursor mass of the spectrum, making mass spectral data analysis slow for huge databases. Tag-based methods extract peptide sequence tags from a tandem mass spectrum and use them as a filter to reduce the number of candidate peptides, thus speeding up the database search. Recently, gapped tags have been introduced into mass spectral data analysis because they improve the sensitivity of peptide identification compared with sequence tags. However, the blocked pattern matching (BPM) problem, which is an essential step in gapped tag-based peptide identification, has not been fully solved.
Results: In this article, we propose a fast and memory-efficient algorithm for the BPM problem. Experiments on both simulated and real datasets showed that the proposed algorithm achieved high speed and high sensitivity for peptide filtration in peptide identification by database search.
Contact: cswangl@cityu.edu.hk or xwliu@iupui.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Similar articles
-
PepSOM: an algorithm for peptide identification by tandem mass spectrometry based on SOM.Genome Inform. 2006;17(2):194-205. Genome Inform. 2006. PMID: 17503392
-
Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search.Bioinformatics. 2007 Mar 1;23(5):612-8. doi: 10.1093/bioinformatics/btl645. Epub 2007 Jan 19. Bioinformatics. 2007. PMID: 17237061
-
Gapped spectral dictionaries and their applications for database searches of tandem mass spectra.Mol Cell Proteomics. 2011 Jun;10(6):M110.002220. doi: 10.1074/mcp.M110.002220. Epub 2011 Mar 28. Mol Cell Proteomics. 2011. PMID: 21444829 Free PMC article.
-
Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms.Brief Bioinform. 2016 Mar;17(2):262-9. doi: 10.1093/bib/bbv043. Epub 2015 Jul 3. Brief Bioinform. 2016. PMID: 26141827 Free PMC article. Review.
-
Protein identification by tandem mass spectrometry and sequence database searching.Methods Mol Biol. 2007;367:87-119. doi: 10.1385/1-59745-275-0:87. Methods Mol Biol. 2007. PMID: 17185772 Review.
Cited by
-
Systematic Evaluation of Protein Sequence Filtering Algorithms for Proteoform Identification Using Top-Down Mass Spectrometry.Proteomics. 2018 Feb;18(3-4):10.1002/pmic.201700306. doi: 10.1002/pmic.201700306. Epub 2018 Feb 6. Proteomics. 2018. PMID: 29327814 Free PMC article.
-
A Spectrum Graph-Based Protein Sequence Filtering Algorithm for Proteoform Identification by Top-Down Mass Spectrometry.Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:222-229. doi: 10.1109/BIBM.2017.8217653. Epub 2017 Dec 18. Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017. PMID: 29503761 Free PMC article.
-
sRNA Profiler: A User-Focused Interface for Small RNA Mapping and Profiling.Cells. 2021 Jul 13;10(7):1771. doi: 10.3390/cells10071771. Cells. 2021. PMID: 34359940 Free PMC article.
-
A graph-based filtering method for top-down mass spectral identification.BMC Genomics. 2018 Sep 24;19(Suppl 7):666. doi: 10.1186/s12864-018-5026-x. BMC Genomics. 2018. PMID: 30255788 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources