Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 17;14(10):125.
doi: 10.1007/s11306-018-1426-9.

New methods to identify high peak density artifacts in Fourier transform mass spectra and to mitigate their effects on high-throughput metabolomic data analysis

Affiliations

New methods to identify high peak density artifacts in Fourier transform mass spectra and to mitigate their effects on high-throughput metabolomic data analysis

Joshua M Mitchell et al. Metabolomics. .

Abstract

Introduction: Direct injection Fourier-transform mass spectrometry (FT-MS) allows for the high-throughput and high-resolution detection of thousands of metabolite-associated isotopologues. However, spectral artifacts can generate large numbers of spectral features (peaks) that do not correspond to known compounds. Misassignment of these artifactual features creates interpretive errors and limits our ability to discern the role of representative features within living systems.

Objectives: Our goal is to develop rigorous methods that identify and handle spectral artifacts within the context of high-throughput FT-MS-based metabolomics studies.

Results: We observed three types of artifacts unique to FT-MS that we named high peak density (HPD) sites: fuzzy sites, ringing and partial ringing. While ringing artifacts are well-known, fuzzy sites and partial ringing have not been previously well-characterized in the literature. We developed new computational methods based on comparisons of peak density within a spectrum to identify regions of spectra with fuzzy sites. We used these methods to identify and eliminate fuzzy site artifacts in an example dataset of paired cancer and non-cancer lung tissue samples and evaluated the impact of these artifacts on classification accuracy and robustness.

Conclusion: Our methods robustly identified consistent fuzzy site artifacts in our FT-MS metabolomics spectral data. Without artifact identification and removal, 91.4% classification accuracy was achieved on an example lung cancer dataset; however, these classifiers rely heavily on artifactual features present in fuzzy sites. Proper removal of fuzzy site artifacts produces a more robust classifier based on non-artifactual features, with slightly improved accuracy of 92.4% in our example analysis.

Keywords: Artifact; Data analysis; Fourier transform; Mass spectrometry; Metabolomics.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

The base study providing the de-identified data analyzed was approved by IRB protocol (IRB 14-0288-F6A) at the University of Kentucky and IRB protocol (#523.05) at the University of Louisville.

Informed consent

Written consent was obtained for the collection of human tissue samples under an IRB approved protocol (IRB 14-0288-F6A) at the University of Kentucky and IRB protocol (#523.05) at the University of Louisville.

Figures

Fig. 1
Fig. 1
Automated HPD-site detection. The HPD artifact detection algorithm in three steps: first, a peak density metric is calculated for the spectrum using a sliding window method (1 m/z window, 0.1 m/z increment); second, a set of N + 1 windows and the peak density metric are used to calculate a peak density statistic for each portion of the spectrum. This metric flattens out density differences due to signal-to-noise differences or baseline differences and highlights spectra with HPD artifacts (Fig. 3e–h). Filtering this metric reveals the location of the HPD artifacts
Fig. 2
Fig. 2
Three types of HPD artifacts. We observed three subclasses of HPD artifacts. The first is the fuzzy site (a, sample D), which we believe is a novel artifact type. The second is ringing, a well-known FT-MS artifact where a single intense peak has many side peaks (b, sample B). We only observed ringing at the scan level. The third artifact is partial ringing which is a ringing-like artifact at the aggregate level (c, sample A). R is the resolution setting used for data acquisition, µS is the microscan setting, and N is the number of scans aggregated to create the spectrum
Fig. 3
Fig. 3
Peak density and peak density statistics. Peak density metric and statistic plots produced by our HPD-detector tool highlight the impact of the instrument on peak density and HPD artifact location. All instruments have higher peak densities at lower m/z, representing trends in signal-to-noise and digitization with respect to m/z in FT-MS. The sharp spikes in peak density correspond to HPD artifacts. The locations of these spikes on Fusion 1 are different before and after the firmware update (a, b), suggesting instrument-level data processing is related to HPD generation. eh show the effectiveness of our peak density statistic metric for flattening the non-constant baseline observed in plots of the raw peak density. Without this correction, identifying HPD regions reliably is difficult. ac, eg were generated from spectra acquired using sample C. d and h were generated from spectra acquired using sample E
Fig. 4
Fig. 4
Fuzzy sites at the Aggregate and Scan Level. A typical fuzzy site a occupies 0.5–3 m/z at the aggregate level and has a distinct ‘fuzzy’ appearance due to very high peak density (this image is identical to Fig. 2a). At the scan level, only a subdomain of the m/z occupied by the fuzzy site contains peaks; the subdomain with peaks varies from scan-to-scan (b). As increasingly more scans are aggregated together, the peak distribution converges to the pattern observed at the aggregate level (c). All panels were generated using sample A. R is the resolution setting used for the acquisition, µS is the microscan setting, and N is the number of scans aggregated to create the spectrum
Fig. 5
Fig. 5
HPD regions depend on biological unit, sample class, and instrument
Fig. 6
Fig. 6
Example fuzzy site locations that vary with sample class. The location of fuzzy sites in spectra from the same biological unit can differ significantly based on sample class (cancer vs. non-cancer). a and b illustrates one fuzzy site whose location varies by rough 2 m/z between sample class. c and d shows a single fuzzy site whose location varies by over 12 m/z between sample class

References

    1. Breiman L. Random forests. Machine Learning. 2001;45:5–32. doi: 10.1023/A:1010933404324. - DOI
    1. Carreer WJ, Flight RM, Moseley HNB. A computational framework for high-throughput isotopic natural abundance correction of omics-level ultra-high resolution FT-MS datasets. Metabolites. 2013;3:853–866. doi: 10.3390/metabo3040853. - DOI - PMC - PubMed
    1. Creek DJ, Chokkathukalam A, Jankevics A, Burgess KE, Breitling R, Barrett MP. Stable isotope-assisted metabolomics for network-wide metabolic pathway elucidation. Analytical Chemistry. 2012;84:8442–8447. doi: 10.1021/ac3018795. - DOI - PMC - PubMed
    1. Dettmer K, Aronov PA, Hammock BD. Mass spectrometry-based metabolomics. Mass Spectrometry Reviews. 2007;26:51–78. doi: 10.1002/mas.20108. - DOI - PMC - PubMed
    1. Eyles SJ, Kaltashov IA. Methods to study protein dynamics and folding by mass spectrometry. Methods. 2004;34:88–99. doi: 10.1016/j.ymeth.2004.03.015. - DOI - PubMed

Publication types

LinkOut - more resources