Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Oct 15:2024.10.12.618041.
doi: 10.1101/2024.10.12.618041.

MSFragger-DDA+ Enhances Peptide Identification Sensitivity with Full Isolation Window Search

Affiliations

MSFragger-DDA+ Enhances Peptide Identification Sensitivity with Full Isolation Window Search

Fengchao Yu et al. bioRxiv. .

Update in

Abstract

Liquid chromatography-mass spectrometry (LC-MS) based proteomics, particularly in the bottom-up approach, relies on the digestion of proteins into peptides for subsequent separation and analysis. The most prevalent method for identifying peptides from data-dependent acquisition (DDA) mass spectrometry data is database search. Traditional tools typically focus on identifying a single peptide per tandem mass spectrum (MS2), often neglecting the frequent occurrence of peptide co-fragmentations leading to chimeric spectra. Here, we introduce MSFragger-DDA+, a novel database search algorithm that enhances peptide identification by detecting co-fragmented peptides with high sensitivity and speed. Utilizing MSFragger's fragment ion indexing algorithm, MSFragger-DDA+ performs a comprehensive search within the full isolation window for each MS2, followed by robust feature detection, filtering, and rescoring procedures to refine search results. Evaluation against established tools across diverse datasets demonstrated that, integrated within the FragPipe computational platform, MSFragger-DDA+ significantly increases identification sensitivity while maintaining stringent false discovery rate (FDR) control. It is also uniquely suited for wide-window acquisition (WWA) data. MSFragger-DDA+ provides an efficient and accurate solution for peptide identification, enhancing the detection of low-abundance co-fragmented peptides. Coupled with the FragPipe platform, MSFragger-DDA+ enables more comprehensive and accurate analysis of proteomics data.

PubMed Disclaimer

Conflict of interest statement

Competing interests A.I.N. and F.Y. receive royalties from the University of Michigan for the sale of MSFragger and IonQuant software licenses to commercial entities. All license transactions are managed by the University of Michigan Innovation Partnerships office, and all proceeds are subject to university technology transfer policy. Other authors declare no other competing interests.

Figures

Figure 1.
Figure 1.. Overview of MSFragger-DDA+ and FragPipe.
(a) MSFragger-DDA+ algorithm. Each tandem mass spectrum is searched against all peptides within the isolation window. Subsequently, MSFragger-DDA+ detects the precursor signals for each of the matched PSMs. After filtering out the PSMs with low-quality precursor signals, MSFragger-DDA+ rescores the PSMs using a greedy algorithm. (b) MSFragger-DDA+ workflow in FragPipe software. The workflow contains MSFragger-DDA+ for database searching, MSBooster for deep-learning-based rescoring, Percolator for PSM ranking, PorteinProphet for protein inference, FDR filtering, IonQuant for label-free quantification (optional), and EasyPQP for spectral library generation (optional).
Figure 2.
Figure 2.. Sensitivity, speed, and false discovery proportion assessment using traditional DDA datasets with different NCE and enzymatic digestions.
(a) and (b) Number of peptide sequences identified by MaxQuant, MetaMorpheus, MSFragger, MSFragger-DDA+, and Scribe. (a) Data with different NCE. (b) Data with different enzymatic digestions. (c) Runtime of MaxQuant, MetaMorpheus, MSFragger, and MSFragger-DDA+. (d) Protein-level FDP evaluation for MaxQuant, MetaMorpheus, MSFragger, and MSFragger-DDA+. Two calculation methods, including the upper bound and lower bound, were applied.
Figure 3.
Figure 3.. Performance benchmarking using timsTOF ddaPASEF data.
(a) Number of quantified proteins from the DDA and DDA+ workflows in the A549 cell line dataset (with three biological replicates, and four technical replicates for each). “MBR+” and “MBR-” refer to IonQuant run with and without MBR, respectively. (b) Numbers and CVs of overlapped and non-overlapped proteins quantified from the DDA and DDA+ workflows using the A549 cell line. The blue box plots are from the unique proteins of the DDA mode, the green box plots are from the common proteins of the DDA mode, the red box plots are from the common proteins of the DDA+ mode, and the yellow box plots are from the unique proteins of the DDA+ mode. The common proteins are the overlapping proteins quantified in both DDA and DDA+ modes. The numbers on the right are the quantified proteins. The box in each plot captures the interquartile range (IQR) with the bottom and top edges representing the first (Q1) and third quartiles (Q3), respectively. The median (Q2) is indicated by a horizontal line within the box. The whiskers extend to the minima and maxima within 1.5 times the IQR below Q1 or above Q3. (c) Two plots showing the protein non-missing values and missing values from the DDA and DDA+ workflows. A549 cell line data. MBR is disabled. The number of columns equals to the number of proteins quantified in the specific setting and listed at the top of each plot. The proteins with non-zero intensities are in orange, and the proteins with zero intensities are in black. (d) Similar to (c) but the MBR is enabled.
Figure 4.
Figure 4.. Sensitivity assessment using three WWA datasets.
(a) and (b) Numbers of peptides from the first WWA dataset of Truong et al. There are 14 samples with different isolation windows, maximum injection time, and MS2 resolutions. Each sample contain two technical replicates. MBR is disabled. (c) and (d) Numbers and CVs of quantified proteins from the second dataset of Truong et al. The dark color is with CV < 20% and the light color is with CV ≥ 20%. The samples are from K562 and HeLa cell lines, respectively. There are four samples with different combinations of cell line and gradient length. Each sample has eight technical replicates. MBR is disabled. (e) and (f) Numbers of identified peptides from the WWA dataset published by Matzinger et al. There are 17 samples with different isolation windows and sample amounts. Each sample has three technical replicates.
Figure 5.
Figure 5.. Performance demonstration using the glioma dataset.
(a) Heatmap of the gene intensities from the DDA+ workflow. (b) PCA plot of the quantitative results from the DDA+ workflow. (c) Volcano plot using the gene intensities from the DDA+ workflow. (d) Box plots showing the percentage of protein-level missing values from the DDA and DDA+ workflows. The box in each plot captures the IQR with the bottom and top edges representing the Q1 and Q3, respectively. The median (Q2) is indicated by a horizontal line within the box. The whiskers extend to the minima and maxima within 1.5 times the IQR below Q1 or above Q3. (e) Venn diagrams showing the number of upregulated and downregulated genes in the DDA and DDA+ workflows.

References

    1. Houel S.; Abernathy R.; Renganathan K.; Meyer-Arendt K.; Ahn N. G.; Old W. M., Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies. J Proteome Res 2010, 9 (8), 4152–60. - PMC - PubMed
    1. Michalski A.; Cox J.; Mann M., More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS. J Proteome Res 2011, 10 (4), 1785–93. - PubMed
    1. Truong T.; Webber K. G. I.; Madisyn Johnston S.; Boekweg H.; Lindgren C. M.; Liang Y.; Nydegger A.; Xie X.; Tsang T. M.; Jayatunge D.; Andersen J. L.; Payne S. H.; Kelly R. T., Data-Dependent Acquisition with Precursor Coisolation Improves Proteome Coverage and Measurement Throughput for Label-Free Single-Cell Proteomics. Angew Chem Int Ed Engl 2023, 62 (34), e202303415. - PMC - PubMed
    1. Matzinger M.; Schmucker A.; Yelagandula R.; Stejskal K.; Krssakova G.; Berger F.; Mechtler K.; Mayer R. L., Micropillar arrays, wide window acquisition and AI-based data analysis improve comprehensiveness in multiple proteomic applications. Nat Commun 2024, 15 (1), 1019. - PMC - PubMed
    1. Kong A. T.; Leprevost F. V.; Avtonomov D. M.; Mellacheruvu D.; Nesvizhskii A. I., MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods 2017, 14 (5), 513–520. - PMC - PubMed

Publication types

LinkOut - more resources