Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug 15;35(16):2774-2782.
doi: 10.1093/bioinformatics/bty1058.

MoMo: discovery of statistically significant post-translational modification motifs

Affiliations

MoMo: discovery of statistically significant post-translational modification motifs

Alice Cheng et al. Bioinformatics. .

Abstract

Motivation: Post-translational modifications (PTMs) of proteins are associated with many significant biological functions and can be identified in high throughput using tandem mass spectrometry. Many PTMs are associated with short sequence patterns called 'motifs' that help localize the modifying enzyme. Accordingly, many algorithms have been designed to identify these motifs from mass spectrometry data. Accurate statistical confidence estimates for discovered motifs are critically important for proper interpretation and in the design of downstream experimental validation.

Results: We describe a method for assigning statistical confidence estimates to PTM motifs, and we demonstrate that this method provides accurate P-values on both simulated and real data. Our methods are implemented in MoMo, a software tool for discovering motifs among sets of PTMs that we make available as a web server and as downloadable source code. MoMo re-implements the two most widely used PTM motif discovery algorithms-motif-x and MoDL-while offering many enhancements. Relative to motif-x, MoMo offers improved statistical confidence estimates and more accurate calculation of motif scores. The MoMo web server offers more proteome databases, more input formats, larger inputs and longer running times than the motif-x web server. Finally, our study demonstrates that the confidence estimates produced by motif-x are inaccurate. This inaccuracy stems in part from the common practice of drawing 'background' peptides from an unshuffled proteome database. Our results thus suggest that many of the papers that use motif-x to find motifs may be reporting results that lack statistical support.

Availability and implementation: The MoMo web server and source code are provided at http://meme-suite.org.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Accuracy of the P-values reported by motif-x* in random data. The three panels show empirical assessments (Q-Q plots) of the statistical accuracy of the P-values reported by motif-x* for the motifs it discovers in 10 000 random datasets containing peptides centered on ‘S’, ‘T’ and ‘Y’ residues, respectively, when the background peptides are shuffled versions of the foreground peptides. Each panel shows results for motifs containing a given central residue. The main plot shows results for the first motif reported by MoMo, and the inset plot shows results for all motifs. Each point represents one motif reported by motif-x*, with y its P-value as reported by motif-x* and x its rank P-value, x=1/(ri+1), where ri is the rank of its P-value among those of the first (main panel) or all (inset panel) reported motifs. The three parallel lines show the curves (from top to bottom) for y=2x, y = x and y=x/2, respectively. There are 3124, 518 and 233 peptides with central ‘S’, ‘T’ and ‘Y’ residues, respectively, in each of the 10 000 datasets
Fig. 2.
Fig. 2.
Accuracy of the P-values reported by motif-x* in real data. The three panels show empirical assessments of the statistical accuracy of the P-values reported by motif-x* for the motifs it discovers in the Pease et al. (2018) Supplementary Data S2 dataset when the background peptides are shuffled versions of the foreground peptides. Each point represents one motif reported by motif-x*, with y giving its P-value as reported by motif-x* and x giving the P-value estimated empirically using 10 000 randomly shuffled versions of the same foreground peptides. There are 3124, 518 and 233 peptides with central ‘S’, ‘T’ and ‘Y’ residues, respectively, in the original input dataset and in each of the 10 000 shuffled versions of it
Fig. 3.
Fig. 3.
Bias of the P-values reported by motif-x* when the foreground and background peptides have different residue frequencies. The left panel shows the empirical assessment (Q-Q plot) of the statistical accuracy of the P-values reported by motif-x* for 10 000 random datasets containing peptides centered on ‘S’ when the background peptides are extracted from a real proteome. Each point represents one motif reported by motif-x*, with y its P-value as reported by motif-x* and x its rank P-value, x=1/(ri+1), where ri is the rank of its P-value among all reported motifs. The right panel shows the residue distributions of the peptides in the foreground and background sets, excluding the central ‘S’ present in each peptide from the calculation. There are 3124 peptides with central ‘S’ in each of the 10 000 datasets
Fig. 4.
Fig. 4.
High motif-x scores are not indicative of high statistical significance. Panels A and B show the number of empirically significant motifs reported by motif-x* and a scatter plot of motif significance versus the reported motif-x score, respectively, when motif-x* uses shuffled foreground peptides as the background peptides. Panels C and D give the same information when motif-x* extracts the background peptides from the P. falciparum proteome. In both cases, the foreground peptides (input dataset) are from the Pease et al. (2018) Supplementary Data S2 file. Empirical P-values are estimated from 10 000 runs of motif-x* on shuffled versions of the input dataset
Fig. 5.
Fig. 5.
Only one of the three motif-x motifs reported in Pease et al. (2018) is statistically significant. The panels show the motif-x score and empirical P-values of the motifs found by motif-x* using the peptides in Supplementary Data S3 and the Ensembl version 38 P. falciparum proteome and peptides from the proteome (panel A) or shuffled foreground peptides (panel B) as the background peptides. In both panels, the minimum number of occurrences parameter is 10 and the minimum motif-x score parameter is 0.00001

Similar articles

Cited by

References

    1. Bailey T.L., Elkan C. (1995) The value of prior knowledge in discovering motifs with MEME. In: Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology, Cambridge, United Kingdom, July 16-19, 1995, pp. 21–29. - PubMed
    1. Chen Y.-C. et al. (2011) Discovery of protein phosphorylation motifs through exploratory data analysis. PloS One, 6, e20025.. - PMC - PubMed
    1. Chou M.F., Schwartz D. (2011) Biological sequence motif discovery using motif-x. Curr. Protocols Bioinform., 35, 13.15.1–13.15.24. - PubMed
    1. Dinkel H. et al. (2010) Phospho.ELM: a database of phosphorylation sites—update 2011. Nucleic Acids Res., 39 (Suppl. 1), D261–D267. - PMC - PubMed
    1. Eden E. et al. (2007) Discovering motifs in ranked lists of DNA sequences. PLoS Comput. Biol., 3, e39.. - PMC - PubMed

Publication types