False discovery rate: the Achilles' heel of proteogenomics
- PMID: 35534181
- DOI: 10.1093/bib/bbac163
False discovery rate: the Achilles' heel of proteogenomics
Abstract
Proteogenomics refers to the integrated analysis of the genome and proteome that leverages mass-spectrometry (MS)-based proteomics data to improve genome annotations, understand gene expression control through proteoforms and find sequence variants to develop novel insights for disease classification and therapeutic strategies. However, proteogenomic studies often suffer from reduced sensitivity and specificity due to inflated database size. To control the error rates, proteogenomics depends on the target-decoy search strategy, the de-facto method for false discovery rate (FDR) estimation in proteomics. The proteogenomic databases constructed from three- or six-frame nucleotide database translation not only increase the search space and compute-time but also violate the equivalence of target and decoy databases. These searches result in poorer separation between target and decoy scores, leading to stringent FDR thresholds. Understanding these factors and applying modified strategies such as two-pass database search or peptide-class-specific FDR can result in a better interpretation of MS data without introducing additional statistical biases. Based on these considerations, a user can interpret the proteogenomics results appropriately and control false positives and negatives in a more informed manner. In this review, first, we briefly discuss the proteogenomic workflows and limitations in database construction, followed by various considerations that can influence potential novel discoveries in a proteogenomic study. We conclude with suggestions to counter these challenges for better proteogenomic data interpretation.
Keywords: FDR; NGS; ORFs; RNA-Seq; false discovery rate; gene annotation; mass spectrometry; novel peptides; proteogenomics; shotgun proteomics; variants.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similar articles
-
Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification.BMC Genomics. 2016 Dec 22;17(Suppl 13):1031. doi: 10.1186/s12864-016-3327-5. BMC Genomics. 2016. PMID: 28155652 Free PMC article.
-
Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine.Clin Chim Acta. 2019 Nov;498:38-46. doi: 10.1016/j.cca.2019.08.010. Epub 2019 Aug 14. Clin Chim Acta. 2019. PMID: 31421119 Review.
-
Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics.J Proteomics. 2013 Jun 28;86:27-42. doi: 10.1016/j.jprot.2013.04.036. Epub 2013 May 9. J Proteomics. 2013. PMID: 23665149
-
Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies.J Proteome Res. 2012 Nov 2;11(11):5221-34. doi: 10.1021/pr300411q. Epub 2012 Oct 15. J Proteome Res. 2012. PMID: 23025403 Free PMC article.
-
A tool for integrating genetic and mass spectrometry-based peptide data: Proteogenomics Viewer: PV: A genome browser-like tool, which includes MS data visualization and peptide identification parameters.Bioessays. 2017 Jul;39(7). doi: 10.1002/bies.201700015. Epub 2017 Jun 5. Bioessays. 2017. PMID: 28582591 Review.
Cited by
-
BPA: a BERT-based priority annotation strategy for assessing the rationality of aquatic algal protein sequences.Brief Bioinform. 2025 Jul 2;26(4):bbaf401. doi: 10.1093/bib/bbaf401. Brief Bioinform. 2025. PMID: 40794952 Free PMC article.
-
Phenotyping Tumor Heterogeneity through Proteogenomics: Study Models and Challenges.Int J Mol Sci. 2024 Aug 14;25(16):8830. doi: 10.3390/ijms25168830. Int J Mol Sci. 2024. PMID: 39201516 Free PMC article. Review.
-
Enhancing Mass spectrometry-based tumor immunopeptide identification: machine learning filter leveraging HLA binding affinity, aliphatic index and retention time deviation.Comput Struct Biotechnol J. 2024 Feb 3;23:859-869. doi: 10.1016/j.csbj.2024.01.023. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 38356658 Free PMC article.
-
StORF-Reporter: finding genes between genes.Nucleic Acids Res. 2023 Nov 27;51(21):11504-11517. doi: 10.1093/nar/gkad814. Nucleic Acids Res. 2023. PMID: 37897345 Free PMC article.
-
Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain.Mass Spectrom Rev. 2025 Jul-Aug;44(4):599-629. doi: 10.1002/mas.21905. Epub 2024 Aug 16. Mass Spectrom Rev. 2025. PMID: 39152539 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources