Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 15;13(1):1347.
doi: 10.1038/s41467-022-29006-z.

DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

Affiliations

DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

Oliver Alka et al. Nat Commun. .

Abstract

The extraction of meaningful biological knowledge from high-throughput mass spectrometry data relies on limiting false discoveries to a manageable amount. For targeted approaches in metabolomics a main challenge is the detection of false positive metabolic features in the low signal-to-noise ranges of data-independent acquisition results and their filtering. Another factor is that the creation of assay libraries for data-independent acquisition analysis and the processing of extracted ion chromatograms have not been automated in metabolomics. Here we present a fully automated open-source workflow for high-throughput metabolomics that combines data-dependent and data-independent acquisition for library generation, analysis, and statistical validation, with rigorous control of the false-discovery rate while matching manual analysis regarding quantification accuracy. Using an experimentally specific data-dependent acquisition library based on reference substances allows for accurate identification of compounds and markers from data-independent acquisition data in low concentrations, facilitating biomarker quantification.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. DIAMetAlyzer - a pipeline for assay library generation and targeted analysis with statistical validation.
DDA data is used for candidate identification containing feature detection, adduct grouping and accurate mass search. Library construction uses fragment annotation via compositional fragmentation trees (SIRIUS) and decoy generation using a fragmentation tree re-rooting method (Passatutto) to create a target-decoy assay library. This library is used in a second step to analyse metabolomics DIA data by performing targeted extraction (OpenSWATH), scoring and statistical validation (PyProphet).
Fig. 2
Fig. 2. FDR filtering and library coverage.
a Peak groups detected and quantified by DIAMetAlyzer in a APM spiked-in human blood plasma dilution series (SWATH - 30 samples) filtered by different FDR thresholds. Without FDR filtering (no FDR), we detected and quantified the highest number of true positive peak groups (TP; n = 3479), but also the highest number of false positive peak groups (FP; n = 1471). At 5% FDR, 3071 true peak groups and 125 false positives were quantified (3.9%). At 1 % FDR the true positive peak groups were further reduced (n = 2523), so were the false positives (n = 19; 0.7%). b Individual pesticide mixes in solvent (around 30 pesticides each) were used to construct the target-decoy assay library. Stringent filtering allows high-quality assays to be used in library construction: Around 9% of the pesticides could not be detected in the data. An additional 14% were not identified via MS1 or did not possess a valid MS2 spectrum (4+ peaks, to allow for fragment annotation). 77% of the pesticides were automatically detected, identified, and annotated. In the library construction step, filtering by the number of transitions greatly affects the coverage of metabolites (three transitions: 60% coverage, two transitions: 71% coverage, one transition: 77% coverage).
Fig. 3
Fig. 3. Identification accuracy and quantification of DIAMetAlyzer on the pesticide spike-in dataset.
a Estimated FDR versus FDR from the ground truth data. b Precision-Recall curve with the area-under-the-curve (AUC = 0.96). c Normalized intensity ratio over the dilution series. The dashed line indicates the expected fourfold difference to the next dilution. The x-axis (top): The number of metabolites found in the specific dilution at a 5% FDR cutoff. More than half of the initial metabolites could be detected at half of our dilution series (1:1,024). d Difference in mean standard deviation regarding the theoretical concentration of the automatic and manual analysis. e Median coefficient of variation (CV across three technical replicates for the automatic and manual analysis (CV < 20%)). For c, d, and e, only metabolites detected in triplicates and below a 5% FDR threshold were analyzed and only true positives were considered in the case of panel e. The box plots in c and e indicate median, 25th and 75th percentiles (middle line, Q1 and Q3 within the box, respectively), including 1.5x interquartile range whiskers and outliers (single points outside this range).
Fig. 4
Fig. 4. Analysis of serum samples of patients with AMD using MetaboDIA and DIAMetAlyzer.
a Comparison of the library generation of both tools based on features (molecular formula, adduct and retention time). 66% of the features overlap between the tools (DIAMetAlyzer: green, MetaboDIA: gray, Overlap: orange). b Number of quantified features using the various libraries in combination with the targeted extraction of DIAMetAlyzer. MetaboDIA, DIAMetAlyzer, the library of both tools (Combined), DIAMetAlyzer with the functionality to use known unknowns without prior MS1 identification additionally to the ones with identification (DIAMetAlyzer + Unknowns). c Significant deregulated compound 5,8,11,14-Eicosatetraenoic acid (EPA - C20H32O2 - based on putative identification) (PCNV = 0.04; PPCV = 0.01) with an increase in mean intensity of 1.4 and 1.7 times in contrast to the control. d Significant deregulated compound 4,7,10,13,16,19-Docosahexaenoic acid (DHA - C22H32O2 - based on putative identification) (PCNV = 0.008; PPCV = 0.006) with an increase mean intensity of 1.7 and 2.0 times in contrast to the control. c, d The identification of the compounds is based on MS1 accurate mass search and MS2 fragment annotation. Differential expression was assessed using limma with Benjamini-Hochberg correction. Box plots indicate median, 25th and 75th percentiles (middle line, Q1 and Q3 within the box, respectively), including 1.5x interquartile range whiskers and outliers (single points outside this range). Control: gray, n = 20 biologically independent samples, CNV (choroidal neovascularization): green, n = 20 biologically independent samples, PCV (polypoidal choroidal neovascularization): blue, n = 20 biologically independent samples.

References

    1. Guo J, Huan T. Comparison of full-scan, data-dependent, and data-independent acquisition modes in liquid chromatography-mass spectrometry based untargeted metabolomics. Anal. Chem. 2020;92:8072–8080. - PubMed
    1. Tsugawa H, et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods. 2015;12:523–526. - PMC - PubMed
    1. Yin Y, Wang R, Cai Y, Wang Z, Zhu Z-J. DecoMetDIA: deconvolution of multiplexed MS/MS spectra for metabolite identification in SWATH-MS-based untargeted metabolomics. Anal. Chem. 2019;91:11897–11904. - PubMed
    1. Zha H, et al. SWATHtoMRM: development of high-coverage targeted metabolomics method using SWATH technology for biomarker discovery. Anal. Chem. 2018;90:4062–4070. - PubMed
    1. Röst HL, et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 2014;32:219–223. - PubMed

Publication types

Grants and funding