Detection and validation of non-synonymous coding SNPs from orthogonal analysis of shotgun proteomics data
- PMID: 17488105
- DOI: 10.1021/pr0700908
Detection and validation of non-synonymous coding SNPs from orthogonal analysis of shotgun proteomics data
Abstract
Orthogonal analysis of amino acid substitutions as a result of SNPs in existing proteomic datasets provides a critical foundation for the emerging field of population-based proteomics. Large-scale proteomics datasets, derived from shotgun tandem MS analysis of complex cellular protein mixtures, contain many unassigned spectra that may correspond to alternate alleles coded by SNPs. The purpose of this work was to identify tandem MS spectra in LC-MS/MS shotgun proteomics datasets that may represent coding nonsynonymous SNPs (nsSNP). To this end, we generated a tryptic peptide database created from allelic information found in NCBI's dbSNP. We searched this database with tandem MS spectra of tryptic peptides from DU4475 breast tumor cells that had been fractioned by pI in the first-dimension and reverse-phase LC in the second dimension. In all we identified 629 nsSNPs, of which 36 were of alternate SNP alleles not found in the reference NCBI or IPI protein databases. Searches for SNP-peptides carry a high risk of false positives due both to mass shifts caused by modifications and because of multiple representations of the same peptide within the genome. In this work, false positives were filtered using a novel peptide pI prediction algorithm and characterized using a decoy database developed by random substitution of similarly sized reference peptides. Secondary validation by sequencing of corresponding genomic DNA confirmed the presence of the predicted SNP in 8 of 10 SNP-peptides. This work highlights that the usefulness of interpreting unassigned spectra as polymorphisms is highly reliant on the ability to detect and filter false positives.
Similar articles
-
LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.Bioinformatics. 2005 Jun 15;21(12):2814-20. doi: 10.1093/bioinformatics/bti442. Epub 2005 Apr 12. Bioinformatics. 2005. PMID: 15827081
-
Added value for tandem mass spectrometry shotgun proteomics data validation through isoelectric focusing of peptides.J Proteome Res. 2005 Nov-Dec;4(6):2273-82. doi: 10.1021/pr050193v. J Proteome Res. 2005. PMID: 16335976
-
Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database.Proteomics. 2005 Aug;5(13):3226-45. doi: 10.1002/pmic.200500358. Proteomics. 2005. PMID: 16104056
-
Elective affinities--bioinformatic analysis of proteomic mass spectrometry data.Arch Physiol Biochem. 2009 Dec;115(5):311-9. doi: 10.3109/13813450903390039. Arch Physiol Biochem. 2009. PMID: 19911947 Review.
-
Protein and peptide identification algorithms using MS for use in high-throughput, automated pipelines.Proteomics. 2005 Nov;5(16):4082-95. doi: 10.1002/pmic.200402091. Proteomics. 2005. PMID: 16196103 Review.
Cited by
-
"Proteotyping": population proteomics of human leukocytes using top down mass spectrometry.Anal Chem. 2008 Apr 15;80(8):2857-66. doi: 10.1021/ac800141g. Epub 2008 Mar 20. Anal Chem. 2008. PMID: 18351787 Free PMC article.
-
Retention Time and Fragmentation Predictors Increase Confidence in Identification of Common Variant Peptides.J Proteome Res. 2023 Oct 6;22(10):3190-3199. doi: 10.1021/acs.jproteome.3c00243. Epub 2023 Sep 1. J Proteome Res. 2023. PMID: 37656829 Free PMC article.
-
Mutant proteins as cancer-specific biomarkers.Proc Natl Acad Sci U S A. 2011 Feb 8;108(6):2444-9. doi: 10.1073/pnas.1019203108. Epub 2011 Jan 19. Proc Natl Acad Sci U S A. 2011. PMID: 21248225 Free PMC article.
-
Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome.PLoS One. 2016 Sep 7;11(9):e0160653. doi: 10.1371/journal.pone.0160653. eCollection 2016. PLoS One. 2016. PMID: 27603779 Free PMC article.
-
Identification of a novel proteoform of prostate specific antigen (SNP-L132I) in clinical samples by multiple reaction monitoring.Mol Cell Proteomics. 2013 Oct;12(10):2761-73. doi: 10.1074/mcp.M113.028365. Epub 2013 Jul 10. Mol Cell Proteomics. 2013. PMID: 23842001 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
Miscellaneous