. 2022 Mar 15;13(1):1347.

doi: 10.1038/s41467-022-29006-z.

DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

Oliver Alka^{1

2}, Premy Shanthamoorthy^{3

4}, Michael Witting^{5

6

7}, Karin Kleigrewe⁸, Oliver Kohlbacher^{9

10

11}, Hannes L Röst^{12

13

14}

Affiliations

¹ Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany. oliver.alka@uni-tuebingen.de.
² Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany. oliver.alka@uni-tuebingen.de.
³ Terrence Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Canada.
⁴ Department of Molecular Genetics, University of Toronto, Toronto, Canada.
⁵ Metabolomics and Proteomics Core, Helmholtz Zentrum München, Neuherberg, Germany.
⁶ Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Neuherberg, Germany.
⁷ Chair of Analytical Food Chemistry, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
⁸ Bavarian Center for Biomolecular Mass Spectrometry, Technical University of Munich, Freising, Germany.
⁹ Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany.
¹⁰ Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.
¹¹ Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany.
¹² Terrence Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Canada. hannes.rost@utoronto.ca.
¹³ Department of Molecular Genetics, University of Toronto, Toronto, Canada. hannes.rost@utoronto.ca.
¹⁴ Department of Computer Science, University of Toronto, Toronto, Canada. hannes.rost@utoronto.ca.

PMID: 35292629
PMCID: PMC8924252
DOI: 10.1038/s41467-022-29006-z

DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

Oliver Alka et al. Nat Commun. 2022.

. 2022 Mar 15;13(1):1347.

doi: 10.1038/s41467-022-29006-z.

Authors

Oliver Alka^{1

2}, Premy Shanthamoorthy^{3

4}, Michael Witting^{5

6

7}, Karin Kleigrewe⁸, Oliver Kohlbacher^{9

10

11}, Hannes L Röst^{12

13

14}

Affiliations

¹ Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany. oliver.alka@uni-tuebingen.de.
² Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany. oliver.alka@uni-tuebingen.de.
³ Terrence Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Canada.
⁴ Department of Molecular Genetics, University of Toronto, Toronto, Canada.
⁵ Metabolomics and Proteomics Core, Helmholtz Zentrum München, Neuherberg, Germany.
⁶ Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Neuherberg, Germany.
⁷ Chair of Analytical Food Chemistry, School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
⁸ Bavarian Center for Biomolecular Mass Spectrometry, Technical University of Munich, Freising, Germany.
⁹ Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany.
¹⁰ Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.
¹¹ Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany.
¹² Terrence Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Canada. hannes.rost@utoronto.ca.
¹³ Department of Molecular Genetics, University of Toronto, Toronto, Canada. hannes.rost@utoronto.ca.
¹⁴ Department of Computer Science, University of Toronto, Toronto, Canada. hannes.rost@utoronto.ca.

PMID: 35292629
PMCID: PMC8924252
DOI: 10.1038/s41467-022-29006-z

Abstract

The extraction of meaningful biological knowledge from high-throughput mass spectrometry data relies on limiting false discoveries to a manageable amount. For targeted approaches in metabolomics a main challenge is the detection of false positive metabolic features in the low signal-to-noise ranges of data-independent acquisition results and their filtering. Another factor is that the creation of assay libraries for data-independent acquisition analysis and the processing of extracted ion chromatograms have not been automated in metabolomics. Here we present a fully automated open-source workflow for high-throughput metabolomics that combines data-dependent and data-independent acquisition for library generation, analysis, and statistical validation, with rigorous control of the false-discovery rate while matching manual analysis regarding quantification accuracy. Using an experimentally specific data-dependent acquisition library based on reference substances allows for accurate identification of compounds and markers from data-independent acquisition data in low concentrations, facilitating biomarker quantification.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. DIAMetAlyzer - a pipeline for assay library generation and targeted analysis with statistical validation.**
DDA data is used for candidate identification containing feature detection, adduct grouping and accurate mass search. Library construction uses fragment annotation via compositional fragmentation trees (SIRIUS) and decoy generation using a fragmentation tree re-rooting method (Passatutto) to create a target-decoy assay library. This library is used in a second step to analyse metabolomics DIA data by performing targeted extraction (OpenSWATH), scoring and statistical validation (PyProphet).

**Fig. 2. FDR filtering and library coverage.**
a Peak groups detected and quantified by DIAMetAlyzer in a APM spiked-in human blood plasma dilution series (SWATH - 30 samples) filtered by different FDR thresholds. Without FDR filtering (no FDR), we detected and quantified the highest number of true positive peak groups (TP; n = 3479), but also the highest number of false positive peak groups (FP; n = 1471). At 5% FDR, 3071 true peak groups and 125 false positives were quantified (3.9%). At 1 % FDR the true positive peak groups were further reduced (n = 2523), so were the false positives (n = 19; 0.7%). b Individual pesticide mixes in solvent (around 30 pesticides each) were used to construct the target-decoy assay library. Stringent filtering allows high-quality assays to be used in library construction: Around 9% of the pesticides could not be detected in the data. An additional 14% were not identified via MS1 or did not possess a valid MS2 spectrum (4+ peaks, to allow for fragment annotation). 77% of the pesticides were automatically detected, identified, and annotated. In the library construction step, filtering by the number of transitions greatly affects the coverage of metabolites (three transitions: 60% coverage, two transitions: 71% coverage, one transition: 77% coverage).

**Fig. 3. Identification accuracy and quantification of DIAMetAlyzer on the pesticide spike-in dataset.**
a Estimated FDR versus FDR from the ground truth data. b Precision-Recall curve with the area-under-the-curve (AUC = 0.96). c Normalized intensity ratio over the dilution series. The dashed line indicates the expected fourfold difference to the next dilution. The x-axis (top): The number of metabolites found in the specific dilution at a 5% FDR cutoff. More than half of the initial metabolites could be detected at half of our dilution series (1:1,024). d Difference in mean standard deviation regarding the theoretical concentration of the automatic and manual analysis. e Median coefficient of variation (CV across three technical replicates for the automatic and manual analysis (CV < 20%)). For c, d, and e, only metabolites detected in triplicates and below a 5% FDR threshold were analyzed and only true positives were considered in the case of panel e. The box plots in c and e indicate median, 25th and 75th percentiles (middle line, Q1 and Q3 within the box, respectively), including 1.5x interquartile range whiskers and outliers (single points outside this range).

**Fig. 4. Analysis of serum samples of patients with AMD using MetaboDIA and DIAMetAlyzer.**
a Comparison of the library generation of both tools based on features (molecular formula, adduct and retention time). 66% of the features overlap between the tools (DIAMetAlyzer: green, MetaboDIA: gray, Overlap: orange). b Number of quantified features using the various libraries in combination with the targeted extraction of DIAMetAlyzer. MetaboDIA, DIAMetAlyzer, the library of both tools (Combined), DIAMetAlyzer with the functionality to use known unknowns without prior MS1 identification additionally to the ones with identification (DIAMetAlyzer + Unknowns). c Significant deregulated compound 5,8,11,14-Eicosatetraenoic acid (EPA - C20H32O2 - based on putative identification) (P_CNV = 0.04; P_PCV = 0.01) with an increase in mean intensity of 1.4 and 1.7 times in contrast to the control. d Significant deregulated compound 4,7,10,13,16,19-Docosahexaenoic acid (DHA - C22H32O2 - based on putative identification) (P_CNV = 0.008; P_PCV = 0.006) with an increase mean intensity of 1.7 and 2.0 times in contrast to the control. c, d The identification of the compounds is based on MS1 accurate mass search and MS2 fragment annotation. Differential expression was assessed using limma with Benjamini-Hochberg correction. Box plots indicate median, 25th and 75th percentiles (middle line, Q1 and Q3 within the box, respectively), including 1.5x interquartile range whiskers and outliers (single points outside this range). Control: gray, n = 20 biologically independent samples, CNV (choroidal neovascularization): green, n = 20 biologically independent samples, PCV (polypoidal choroidal neovascularization): blue, n = 20 biologically independent samples.

See this image and copyright information in PMC

References

1. Guo J, Huan T. Comparison of full-scan, data-dependent, and data-independent acquisition modes in liquid chromatography-mass spectrometry based untargeted metabolomics. Anal. Chem. 2020;92:8072–8080. - PubMed
1. Tsugawa H, et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods. 2015;12:523–526. - PMC - PubMed
1. Yin Y, Wang R, Cai Y, Wang Z, Zhu Z-J. DecoMetDIA: deconvolution of multiplexed MS/MS spectra for metabolite identification in SWATH-MS-based untargeted metabolomics. Anal. Chem. 2019;91:11897–11904. - PubMed
1. Zha H, et al. SWATHtoMRM: development of high-coverage targeted metabolomics method using SWATH technology for biomarker discovery. Anal. Chem. 2018;90:4062–4070. - PubMed
1. Röst HL, et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 2014;32:219–223. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

419634/CIHR/Canada

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

Affiliations

DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases