A transformer model for de novo sequencing of data-independent acquisition mass spectrometry data
- PMID: 40596427
- DOI: 10.1038/s41592-025-02718-y
A transformer model for de novo sequencing of data-independent acquisition mass spectrometry data
Abstract
A core computational challenge in the analysis of mass spectrometry data is the de novo sequencing problem, in which the generating amino acid sequence is inferred directly from an observed fragmentation spectrum without the use of a sequence database. Recently, deep learning models have made substantial advances in de novo sequencing by learning from massive datasets of high-confidence labeled mass spectra. However, these methods are designed primarily for data-dependent acquisition experiments. Over the past decade, the field of mass spectrometry has been moving toward using data-independent acquisition (DIA) protocols for the analysis of complex proteomic samples owing to their superior specificity and reproducibility. Hence, we present a de novo sequencing model called Cascadia, which uses a transformer architecture to handle the more complex data generated by DIA protocols. In comparisons with existing approaches for de novo sequencing of DIA data, Cascadia achieves substantially improved performance across a range of instruments and experimental protocols.
© 2025. The Author(s), under exclusive licence to Springer Nature America, Inc.
Conflict of interest statement
Competing interests: The MacCoss Lab at the University of Washington receives funding from Agilent, Bruker, Sciex, Shimadzu, Thermo Fisher Scientific and Waters to support the development of Skyline, a quantitative analysis software tool. M.J.M. is a paid consultant for Thermo Fisher Scientific. The other authors declare no competing interests.
Similar articles
-
An algorithm for peptide de novo sequencing from a group of SILAC labeled MS/MS spectra.J Bioinform Comput Biol. 2025 Jun;23(3):2550007. doi: 10.1142/S0219720025500076. Epub 2025 Jul 15. J Bioinform Comput Biol. 2025. PMID: 40618198
-
Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra.J Proteome Res. 2025 Jul 4;24(7):3722-3730. doi: 10.1021/acs.jproteome.5c00063. Epub 2025 Jun 2. J Proteome Res. 2025. PMID: 40454436 Free PMC article.
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320. Health Technol Assess. 2001. PMID: 12065068
-
Computational methods for protein identification from mass spectrometry data.PLoS Comput Biol. 2008 Feb;4(2):e12. doi: 10.1371/journal.pcbi.0040012. PLoS Comput Biol. 2008. PMID: 18463710 Free PMC article.
References
-
- Bittremieux, W. et al. Deep learning methods for de novo peptide sequencing. Mass Spectrom. Rev. https://doi.org/10.1002/mas.21919 (2024).
-
- Vaswani, A. et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017).
-
- Venable, J. D., Dong, M. Q., Wohlsclegel, J., Dillin, A. & Yates III, J. R. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat. Methods 1, 39–45 (2004). - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous