Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Oct 16:2024.05.25.595875.
doi: 10.1101/2024.05.25.595875.

diaTracer enables spectrum-centric analysis of diaPASEF proteomics data

Affiliations

diaTracer enables spectrum-centric analysis of diaPASEF proteomics data

Kai Li et al. bioRxiv. .

Update in

Abstract

Data-independent acquisition (DIA) has become a widely used strategy for peptide and protein quantification in mass spectrometry-based proteomics studies. The integration of ion mobility separation into DIA analysis, such as the diaPASEF technology available on Bruker's timsTOF platform, further improves the quantification accuracy and protein depth achievable using DIA. We introduce diaTracer, a new spectrum-centric computational tool optimized for diaPASEF data. diaTracer performs three-dimensional (m/z, retention time, ion mobility) peak tracing and feature detection to generate precursor-resolved "pseudo-MS/MS" spectra, facilitating direct ("spectral-library free") peptide identification and quantification from diaPASEF data. diaTracer is available as a stand-alone tool and is fully integrated into the widely used FragPipe computational platform. We demonstrate the performance of diaTracer and FragPipe using diaPASEF data from triple-negative breast cancer (TNBC), cerebrospinal fluid (CSF), and plasma samples, data from phosphoproteomics and HLA immunopeptidomics experiments, and low-input data from a spatial proteomics study. We also show that diaTracer enables unrestricted identification of post-translational modifications from diaPASEF data using open/mass-offset searches.

PubMed Disclaimer

Conflict of interest statement

Competing interests A.I.N. and F.Y. receive royalties from the University of Michigan for the sale of MSFragger and IonQuant software licenses to commercial entities. All license transactions are managed by the University of Michigan Innovation Partnerships office, and all proceeds are subject to university technology transfer policy. Other authors declare no other competing interests.

Figures

Figure 1:
Figure 1:. Overview of diaTracer and FragPipe computational platform.
diaTracer applies a 3D feature detection algorithm to detect signals from all possible precursors and fragments in MS1 and MS2 diaPASEF data. Pseudo-MS/MS spectra are generated through precursor-fragment clustering and can be processed as DDA data using MSFragger and FragPipe to build a spectral library directly from the data. A hybrid spectral library can also be generated if DDA data are available. This spectral library is then used to extract quantification using DIA-NN.
Figure 2:
Figure 2:. Deep proteome profiling.
a) Box plot showing the numbers of quantified proteins using different methods in the TNBC dataset. The lower and upper edges of the box represent the first (Q1) and the third quartiles (Q3). The interquartile range (IQR) is the box between Q1 and Q3. Whiskers extend from the box to the smallest and largest data points within 1.5 times the IQR from Q1 and Q3, respectively. Data points outside this range are considered outliers and are shown as individual dots. b) Histogram showing the number of quantified proteins in the TNBC dataset using Spectronaut 18.5 and FragPipe with diaTracer, direct DIA and hybrid DDA/DIA analysis, and using DIA-NN in library-free mode, after application of different non-missing value filters.
Figure 3:
Figure 3:. CSF data.
a) Box plot showing the numbers of quantified proteins using different methods, with colors representing cleavage types (green: trypsin cleavage; orange: allowing semi-tryptic peptides). In the boxplot, the central line represents the median of the numbers. The lower and upper edges of the box represent the first (Q1) and third quartiles (Q3). The interquartile range (IQR) is the box between Q1 and Q3. Whiskers extend from the box to the smallest and largest data points within 1.5 times the IQR from Q1 and Q3, respectively. Data points outside this range are considered outliers and are shown as individual dots. b) Box plot showing the numbers of quantified precursors using different methods. The box plots’ median, edges, and whiskers are same as the ones in a). c) Distribution of identified peptides for protein HPX, with the red segment indicating the signal peptide region at the N-terminal. Orange segments represent tryptic peptides, while green segments represent semi-tryptic peptides. d) Running time comparison. FragPipe* indicated the FragPipe run time (including DIA-NN for quantification) starting from diaTracer-extracted files. e) Modifications identified using the common mass-offset workflow in FragPipe using pseudo-MS/MS spectra generated by diaTracer.
Figure 4:
Figure 4:. Plasma data.
a) Histogram showing the number of quantified proteins using diaTracer-based FragPipe workflows (tryptic and semi-tryptic search), using DIA-NN library-free mode, colored from deep to light blue corresponding to different non-missing value filters. b) Same as a) for quantified precursors. c) Volcano plot comparing protein abundance between stage IV non-small cell lung cancer (NSCLC) and non-cancer control samples, highlighting NSCLC-overexpressed proteins (Log2 fold change >= 0.6; adjusted p-value <= 0.05). d) Boxplots of protein abundance distribution of AKT2 (top) and H2AX (bottom)proteins in semi-tryptic (left) and tryptic (right) searches between NSCLC (red) and control (blue) samples. In the boxplot, the central line represents the median of the numbers. The lower and upper edges of the box represent the first (Q1) and third quartiles (Q3). The interquartile range (IQR) is the box between Q1 and Q3. Whiskers extend from the box to the smallest and largest data points within 1.5 times the IQR from Q1 and Q3, respectively. Data points outside this range are considered outliers and are shown as individual dots. Tr: tryptic search; Semi: semi-tryptic search.
Figure 5:
Figure 5:. HLA immunopeptidomics data.
a) Number of quantified immunopeptides obtained using FragPipe with diaTracer and those reported in the original study based on Spectronaut 17. b) Histogram of predicted binders for all HLA alleles of the corresponding sample donor, colored by binder type (light: weak binder; dark: strong binder). c) Length distribution of quantified immunopeptides using FragPipe with diaTracer and that reported in the original study based on Spectronaut 17 directDIA, DDA experimental library, and panlibrary. d) Charge state distribution of quantified immunopeptides using FragPipe with diaTracer and that reported in the original study based on Spectronaut 17 directDIA. e) Pseudo-MS/MS spectrum generated by diaTracer and the predicted spectrum for one of the identified immunopeptides, VYQHLFTRI. The entropy score between the two spectra is 0.9863. Visualization using FragPipe-PDV viewer.
Figure 6:
Figure 6:. Phosphoproteomics data.
a) Histogram of quantified phosphorylated peptide sequences across different gradients using FragPipe with diaTracer and those reported in the original study based on Spectronaut 16. b) Quantified phosphorylated peptides intensities and correlations in four replicates if the 7 min gradient time experiment. c) and d) PSM and fragment XICs of phosphorylated peptide S(Pho)PSPPDGSPAATPEIR. e) and f) PSM and fragment XICs of phosphorylated peptide SPSPPDGS(Pho)PAATPEIR. The XICs were generated by Skyline.
Figure 7:
Figure 7:. Low-input, spatial proteomics data.
a) Number of quantified proteins after application of non-missing value (in at least one group) filter ranging from 0% to 100%, with line colors representing different methods. Red: results from the original study based on the library built using high-input samples; Green: results based on FragPipe with diaTracer, also using high-input data to build the library (“FragPipe high-input Lib”); Blue: result using diaTracer and FragPipe with low-input data only (“FragPipe”). b) Venn diagram of quantified proteins between the three methods, with data filtered to keep proteins with at least 70% non-missing values in at least one group. c) Principal-component analysis (PCA) plot of 148 samples based on 1471 proteins (after missing value filtering and data imputation) quantified using the FragPipe workflow (using low-input data only). d) Log2 transformed protein level abundance distribution of selected cell-type-specific proteins in different regions. In the boxplot, the central line represents the median of the numbers. The lower and upper edges of the box represent the first (Q1) and third quartiles (Q3). The interquartile range (IQR) is the box between Q1 and Q3. Whiskers extend from the box to the smallest and largest data points within 1.5 times the IQR from Q1 and Q3, respectively. Data points outside this range are considered outliers and are shown as individual dots. e) Volcano plot showing protein abundance differences between Mantel zone and T cell zone, highlighting tissue-specific proteins (Log2 fold change >= 1; adjusted p-value <= 0.05).

Similar articles

References

    1. Aebersold R. & Mann M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016). - PubMed
    1. Gillet L.C. et al. Targeted Data Extraction of the MS/MS Spectra Generated by Data-independent Acquisition: A New Concept for Consistent and Accurate Proteome Analysis*. Molecular & Cellular Proteomics 11, O111.016717 (2012). - PMC - PubMed
    1. Panchaud A. et al. Precursor Acquisition Independent From Ion Count: How to Dive Deeper into the Proteomics Ocean. Analytical Chemistry 81, 6481–6488 (2009). - PMC - PubMed
    1. Venable J.D., Dong M.-Q., Wohlschlegel J., Dillin A. & Yates J.R. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nature Methods 1, 39–45 (2004). - PubMed
    1. May J.C. & McLean J.A. Ion mobility-mass spectrometry: time-dispersive instrumentation. Anal Chem 87, 1422–1436 (2015). - PMC - PubMed

Publication types