. 2019 Aug 15;35(16):2774-2782.

doi: 10.1093/bioinformatics/bty1058.

MoMo: discovery of statistically significant post-translational modification motifs

Alice Cheng¹, Charles E Grant¹, William S Noble^{1

2}, Timothy L Bailey³

Affiliations

¹ Department of Genome Sciences, University of Washington, Seattle, WA, USA.
² Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
³ Department of Pharmacology, University of Nevada, Reno, NV, USA.

PMID: 30596994
PMCID: PMC6691336
DOI: 10.1093/bioinformatics/bty1058

MoMo: discovery of statistically significant post-translational modification motifs

Alice Cheng et al. Bioinformatics. 2019.

. 2019 Aug 15;35(16):2774-2782.

doi: 10.1093/bioinformatics/bty1058.

Authors

Alice Cheng¹, Charles E Grant¹, William S Noble^{1

2}, Timothy L Bailey³

Affiliations

¹ Department of Genome Sciences, University of Washington, Seattle, WA, USA.
² Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
³ Department of Pharmacology, University of Nevada, Reno, NV, USA.

PMID: 30596994
PMCID: PMC6691336
DOI: 10.1093/bioinformatics/bty1058

Abstract

Motivation: Post-translational modifications (PTMs) of proteins are associated with many significant biological functions and can be identified in high throughput using tandem mass spectrometry. Many PTMs are associated with short sequence patterns called 'motifs' that help localize the modifying enzyme. Accordingly, many algorithms have been designed to identify these motifs from mass spectrometry data. Accurate statistical confidence estimates for discovered motifs are critically important for proper interpretation and in the design of downstream experimental validation.

Results: We describe a method for assigning statistical confidence estimates to PTM motifs, and we demonstrate that this method provides accurate P-values on both simulated and real data. Our methods are implemented in MoMo, a software tool for discovering motifs among sets of PTMs that we make available as a web server and as downloadable source code. MoMo re-implements the two most widely used PTM motif discovery algorithms-motif-x and MoDL-while offering many enhancements. Relative to motif-x, MoMo offers improved statistical confidence estimates and more accurate calculation of motif scores. The MoMo web server offers more proteome databases, more input formats, larger inputs and longer running times than the motif-x web server. Finally, our study demonstrates that the confidence estimates produced by motif-x are inaccurate. This inaccuracy stems in part from the common practice of drawing 'background' peptides from an unshuffled proteome database. Our results thus suggest that many of the papers that use motif-x to find motifs may be reporting results that lack statistical support.

Availability and implementation: The MoMo web server and source code are provided at http://meme-suite.org.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Accuracy of the P-values reported by motif-x* in random data. The three panels show empirical assessments (Q-Q plots) of the statistical accuracy of the P-values reported by motif-x* for the motifs it discovers in 10 000 random datasets containing peptides centered on ‘S’, ‘T’ and ‘Y’ residues, respectively, when the *background peptides are shuffled versions of the foreground peptides*. Each panel shows results for motifs containing a given central residue. The main plot shows results for the *first* motif reported by MoMo, and the inset plot shows results for *all* motifs. Each point represents one motif reported by motif-x*, with y its P-value as reported by motif-x* and x its rank P-value, $x = 1 / (r_{i} + 1)$ , where *r_i* is the rank of its P-value among those of the first (main panel) or all (inset panel) reported motifs. The three parallel lines show the curves (from top to bottom) for $y = 2 x$ , y = x and $y = x / 2$ , respectively. There are 3124, 518 and 233 peptides with central ‘S’, ‘T’ and ‘Y’ residues, respectively, in each of the 10 000 datasets

**Fig. 2.**
Accuracy of the P-values reported by motif-x* in real data. The three panels show empirical assessments of the statistical accuracy of the P-values reported by motif-x* for the motifs it discovers in the Pease *et al.* (2018) Supplementary Data S2 dataset when the background peptides are shuffled versions of the foreground peptides. Each point represents one motif reported by motif-x*, with y giving its P-value as reported by motif-x* and x giving the P-value estimated empirically using 10 000 randomly shuffled versions of the same foreground peptides. There are 3124, 518 and 233 peptides with central ‘S’, ‘T’ and ‘Y’ residues, respectively, in the original input dataset and in each of the 10 000 shuffled versions of it

**Fig. 3.**
Bias of the P-values reported by motif-x* when the foreground and background peptides have different residue frequencies. The left panel shows the empirical assessment (Q-Q plot) of the statistical accuracy of the P-values reported by motif-x* for 10 000 random datasets containing peptides centered on ‘S’ when the background peptides are extracted from a real proteome. Each point represents one motif reported by motif-x*, with y its P-value as reported by motif-x* and x its rank P-value, $x = 1 / (r_{i} + 1)$ , where *r_i* is the rank of its P-value among all reported motifs. The right panel shows the residue distributions of the peptides in the foreground and background sets, excluding the central ‘S’ present in each peptide from the calculation. There are 3124 peptides with central ‘S’ in each of the 10 000 datasets

**Fig. 4.**
High motif-x scores are not indicative of high statistical significance. Panels A and B show the number of *empirically* significant motifs reported by motif-x* and a scatter plot of motif significance versus the reported motif-x score, respectively, when motif-x* uses shuffled foreground peptides as the background peptides. Panels C and D give the same information when motif-x* extracts the background peptides from the *P. falciparum* proteome. In both cases, the foreground peptides (input dataset) are from the Pease *et al.* (2018) Supplementary Data S2 file. Empirical P-values are estimated from 10 000 runs of motif-x* on shuffled versions of the input dataset

**Fig. 5.**
Only one of the three motif-x motifs reported in Pease *et al.* (2018) is statistically significant. The panels show the motif-x score and empirical P-values of the motifs found by motif-x* using the peptides in Supplementary Data S3 and the Ensembl version 38 *P. falciparum* proteome and peptides from the proteome (panel A) or shuffled foreground peptides (panel B) as the background peptides. In both panels, the minimum number of occurrences parameter is 10 and the minimum motif-x score parameter is 0.00001

See this image and copyright information in PMC

Cited by

HypDB: A functionally annotated web-based database of the proline hydroxylation proteome.
Gong Y, Behera G, Erber L, Luo A, Chen Y. Gong Y, et al. PLoS Biol. 2022 Aug 26;20(8):e3001757. doi: 10.1371/journal.pbio.3001757. eCollection 2022 Aug. PLoS Biol. 2022. PMID: 36026437 Free PMC article.
Dihydroartemisinin regulates immune cell heterogeneity by triggering a cascade reaction of CDK and MAPK phosphorylation.
Li Q, Yuan Q, Jiang N, Zhang Y, Su Z, Lv L, Sang X, Chen R, Feng Y, Chen Q. Li Q, et al. Signal Transduct Target Ther. 2022 Jul 11;7(1):222. doi: 10.1038/s41392-022-01028-5. Signal Transduct Target Ther. 2022. PMID: 35811310 Free PMC article.
Multiple Layers of Phospho-Regulation Coordinate Metabolism and the Cell Cycle in Budding Yeast.
Zhang L, Winkler S, Schlottmann FP, Kohlbacher O, Elias JE, Skotheim JM, Ewald JC. Zhang L, et al. Front Cell Dev Biol. 2019 Dec 17;7:338. doi: 10.3389/fcell.2019.00338. eCollection 2019. Front Cell Dev Biol. 2019. PMID: 31921850 Free PMC article.
Robust unsupervised deconvolution of linear motifs characterizes 68 protein modifications at proteome scale.
Smith TG, Uzozie AC, Chen S, Lange PF. Smith TG, et al. Sci Rep. 2021 Nov 18;11(1):22490. doi: 10.1038/s41598-021-01971-3. Sci Rep. 2021. PMID: 34795380 Free PMC article.
Phosphorylation of multiple proteins involved in ciliogenesis by Tau Tubulin kinase 2.
Bernatik O, Pejskova P, Vyslouzil D, Hanakova K, Zdrahal Z, Cajanek L. Bernatik O, et al. Mol Biol Cell. 2020 May 1;31(10):1032-1046. doi: 10.1091/mbc.E19-06-0334. Epub 2020 Mar 4. Mol Biol Cell. 2020. PMID: 32129703 Free PMC article.

See all "Cited by" articles

References

1. Bailey T.L., Elkan C. (1995) The value of prior knowledge in discovering motifs with MEME. In: Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, Cambridge, United Kingdom, July 16-19, 1995, pp. 21–29. - PubMed
1. Chen Y.-C. et al. (2011) Discovery of protein phosphorylation motifs through exploratory data analysis. PloS One, 6, e20025.. - PMC - PubMed
1. Chou M.F., Schwartz D. (2011) Biological sequence motif discovery using motif-x. Curr. Protocols Bioinform., 35, 13.15.1–13.15.24. - PubMed
1. Dinkel H. et al. (2010) Phospho.ELM: a database of phosphorylation sites—update 2011. Nucleic Acids Res., 39 (Suppl. 1), D261–D267. - PMC - PubMed
1. Eden E. et al. (2007) Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol., 3, e39.. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM103544/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MoMo: discovery of statistically significant post-translational modification motifs

Affiliations

MoMo: discovery of statistically significant post-translational modification motifs

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous