. 2015 Mar;12(3):258-64, 7 p following 264.

doi: 10.1038/nmeth.3255. Epub 2015 Jan 19.

DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics

Chih-Chiang Tsou¹, Dmitry Avtonomov², Brett Larsen³, Monika Tucholska³, Hyungwon Choi⁴, Anne-Claude Gingras⁵, Alexey I Nesvizhskii¹

Affiliations

¹ 1] Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA.
² Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA.
³ Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.
⁴ Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore.
⁵ 1] Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada. [2] Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.

PMID: 25599550
PMCID: PMC4399776
DOI: 10.1038/nmeth.3255

DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics

Chih-Chiang Tsou et al. Nat Methods. 2015 Mar.

. 2015 Mar;12(3):258-64, 7 p following 264.

doi: 10.1038/nmeth.3255. Epub 2015 Jan 19.

Authors

Chih-Chiang Tsou¹, Dmitry Avtonomov², Brett Larsen³, Monika Tucholska³, Hyungwon Choi⁴, Anne-Claude Gingras⁵, Alexey I Nesvizhskii¹

Affiliations

¹ 1] Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA. [2] Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA.
² Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA.
³ Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.
⁴ Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore.
⁵ 1] Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada. [2] Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.

PMID: 25599550
PMCID: PMC4399776
DOI: 10.1038/nmeth.3255

Abstract

As a result of recent improvements in mass spectrometry (MS), there is increased interest in data-independent acquisition (DIA) strategies in which all peptides are systematically fragmented using wide mass-isolation windows ('multiplex fragmentation'). DIA-Umpire (http://diaumpire.sourceforge.net/), a comprehensive computational workflow and open-source software for DIA data, detects precursor and fragment chromatographic features and assembles them into pseudo-tandem MS spectra. These spectra can be identified with conventional database-searching and protein-inference tools, allowing sensitive, untargeted analysis of DIA data without the need for a spectral library. Quantification is done with both precursor- and fragment-ion intensities. Furthermore, DIA-Umpire enables targeted extraction of quantitative information based on peptides initially identified in only a subset of the samples, resulting in more consistent quantification across multiple samples. We demonstrated the performance of the method with control samples of varying complexity and publicly available glycoproteomics and affinity purification-MS data.

PubMed Disclaimer

Figures

**Fig. 1. Untargeted and targeted data analysis strategies and DIA-Umpire hybrid framework**
**(a)**Conventional analysis of DDA data is based on matching MS/MS spectra against a proteome-wide sequence database or a spectral library (spectrum-centric search). Peptides (and then proteins) are quantified using MS1 signal intensity or spectral counts (label-free quantification) **(b)** Current methods for DIA analysis are based on targeted data extraction, in which peptide ions from a spectral library are queried against experimental data (peptide-centric search) to find the best matching fragment ion signals and their intensities (MS2 based quantification). **(c)** DIA-Umpire hybrid workflow performs signal extraction from DIA MS1 and MS2 spectra to construct precursor–fragment groups (see Fig. 2 and Online Methods for details). Each precursor–fragment group is then analyzed using spectrum-centric searching to identify the peptides, as in (a). Peptide-centric matching is then performed to query unidentified precursor–fragment groups against a spectral library, as in (b). The spectral library can be built from the initial untargeted (spectrum-centric) results using the same DIA data, or can be combined (replaced) with an external spectral library built using DDA data. Quantification can be done from either MS1 precursor- or MS2 fragment-ion intensities.

**Fig. 2. DIA-Umpire signal processing algorithms**
The feature detection algorithm is applied to DIA MS1 and MS2 spectra to detect all possible MS1 peptide precursor ions and MS2 fragment signals. Each detected precursor feature is grouped with corresponding co-eluting fragment ion features based on Pearson correlation of LC elution peaks and retention times of peak apexes to form precursor-fragments groups. These precursor–fragment groups are used to construct pseudo MS/MS spectra (separated into different quality tiers based on the quality of detected precursor ion signal) for untargeted spectrum-centric database search and identification. The precursor–fragment groups are stored and are again queried during the second, peptide-centric targeted data extraction stage.

**Fig. 3. Untargeted peptide and protein identification using DDA and DIA data from UPS2, *E. coli*, and human cell lysate samples**
**(a)** The number of peptide ions and proteins identified by X! Tandem search engine at 1% FDR in DDA and in DIA (SWATH) data from UPS2, *E. coli*, and human cell lysate samples. **(b)** The number of peptide ions and protein identifications (X! Tandem) in each replicate of the UPS2 sample DDA and DIA data plotted separately for proteins of different abundance (in UPS2 samples 48 proteins span 5 orders of magnitude of abundance ranging from 0.5 to 50,000 fmoles with 8 proteins in each abundance range).

**Fig. 4. Comparative analysis of peptide identifications from DDA and DIA data from human cell lysate samples**
**(a)** The numbers of proteins and peptide ions identified at 1% FDR by X! Tandem search engine in DDA and in DIA (SWATH) data. ***Left***: the number of protein identifications. ***Right***: the number of peptide ion identifications (9,272 peptide ions identified from DDA data, 8,757 from DIA, 12,660 in total). Of the peptide ions identified by DIA and not DDA at 1% FDR (3,388), the majority were not identified by DDA because no MS/MS spectrum was acquired (2,326). Of the peptide ions identified from DDA data and not from DIA at 1% FDR (3,903), DIA-Umpire was able to detect precursor features for 3,338 of these peptide ions. **(b)** Fraction of fragment ions matched in pseudo MS/MS spectra extracted from DIA data as a function of MS1 peptide ion intensity in DDA data. Data points (peptide ions) and the summary density plots (“Frequencies”) are colored according to the two categories of peptide ions: those identified from DIA data at 1% FDR (high scoring in DIA, blue), and unidentified in DIA (orange; these ions were found in DIA data as described in Online Methods). **(c)** Comparison between DDA and DIA in terms of fraction of fragments matched among the two categories of peptide ions described in (b), showing that peptide ions identified with confidence from DDA but not from DIA have fewer matched fragments.

**Fig. 5. Illustration of the entire DIA-Umpire workflow using affinity purification – SWATH interactome dataset**
**(a)** Two bait proteins (EIF4A2 and MEPCE) and the negative control (GFP) samples were analyzed in biological triplicates using AP-SWATH. The complete DIA-Umpire pipeline was applied to quantify proteins in these samples. The quantified proteins were further analyzed using SAINT (intensity model) to compute protein-protein interaction probabilities. **(b)** The distribution of scores (U-score) computed by the targeted re-extraction algorithm of DIA-Umpire. Data shown are from one biological replicate of MEPCE AP-SWATH run. The observed distribution was modeled using the mixture modeling approach (blue curve: false identification model; red curve: correct identifications) to compute the posterior probability for each match. Peptide ions with a computed probability above 0.99 were considered confidently identified and contributed, together with the peptide ions identified at the initial untargeted identification stage, to protein quantification for their corresponding protein. **(c)** The numbers of proteins identified in only one, two, or all three biological replicates for each experiment after the initial untargeted search and after targeted data re-extraction. Comparison between the sets of proteins identified in each experiment (all biological replicates combined) after the untargeted search and after targeted data re-extraction, showing an increase in the number of proteins quantified across all three samples. **(d)** High reproducibility of protein intensities between two MEPCE AP-SWATH biological replicates computed by DIA-Umpire using the “MS2 Top6pep/Top6fra, Freq>0.5” quantification approach.

See this image and copyright information in PMC

References

1. Bantscheff M, Lemeer S, Savitski MM, Kuster B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem. 2012;404:939–965. - PubMed
1. Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. Journal of Proteomics. 2010;73:2092–2123. - PMC - PubMed
1. Bailey DJ, McDevitt MT, Westphall MS, Pagliarini DJ, Coon JJ. Intelligent data acquisition blends targeted and discovery methods. J Proteome Res. 2014;13:2152–2161. - PMC - PubMed
1. Weisbrod CR, Eng JK, Hoopmann MR, Baker T, Bruce JE. Accurate Peptide Fragment Mass Analysis: Multiplexed Peptide Identification and Quantification. J Proteome Res. 2012;11:1621–1632. - PMC - PubMed
1. Michalski A, Cox J, Mann M. More than 100,000 Detectable Peptide Species Elute in Single Shotgun Proteomics Runs but the Majority is Inaccessible to Data-Dependent LC-MS/MS. J Proteome Res. 2011;10:1785–1793. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics

Affiliations

DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials