Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 4;21(11):2810-2814.
doi: 10.1021/acs.jproteome.2c00278. Epub 2022 Oct 6.

A Parallelization Strategy for the Time Efficient Analysis of Thousands of LC/MS Runs in High-Performance Computing Environment

Affiliations

A Parallelization Strategy for the Time Efficient Analysis of Thousands of LC/MS Runs in High-Performance Computing Environment

Patrick van Zalm et al. J Proteome Res. .

Abstract

Combining robust proteomics instrumentation with high-throughput enabling liquid chromatography (LC) systems (e.g., timsTOF Pro and the Evosep One system, respectively) enabled mapping the proteomes of 1000s of samples. Fragpipe is one of the few computational protein identification and quantification frameworks that allows for the time-efficient analysis of such large data sets. However, it requires large amounts of computational power and data storage space that leave even state-of-the-art workstations underpowered when it comes to the analysis of proteomics data sets with 1000s of LC mass spectrometry runs. To address this issue, we developed and optimized a Fragpipe-based analysis strategy for a high-performance computing environment and analyzed 3348 plasma samples (6.4 TB) that were longitudinally collected from hospitalized COVID-19 patients under the auspice of the Immunophenotyping Assessment in a COVID-19 Cohort (IMPACC) study. Our parallelization strategy reduced the total runtime by ∼90% from 116 (theoretical) days to just 9 days in the high-performance computing environment. All code is open-source and can be deployed in any Simple Linux Utility for Resource Management (SLURM) high-performance computing environment, enabling the analysis of large-scale high-throughput proteomics studies.

Keywords: Fragpipe; HPC; SLURM; parallelization; proteomics; timsTOF.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest

N/A

Figures

Figure 1.
Figure 1.
Schematic workflow that runs Fragpipe on a HPC with parallelized components to decrease total run time. For each computational tool CPU and RAM usage (node with 180GB RAM and 96 CPU cores), increase of disk usage and the required time is shown.
Figure 2.
Figure 2.
Time requirements (minutes) to re-write 3348 BDF’s into the MSFragger mzBIN format for the four combinations of serial compared to parallelization (x 20) and Network storage against node storage.

Similar articles

Cited by

References

    1. IMPACC Manuscript Writing Team et al. Immunophenotyping assessment in a COVID-19 cohort (IMPACC): A prospective longitudinal study. Sci. Immunol. 6, eabf3733 (2021). - PMC - PubMed
    1. Cox J & Mann M MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26, 1367–1372 (2008). - PubMed
    1. Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D & Nesvizhskii AI MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics. Nat Methods 14, 513–520 (2017). - PMC - PubMed
    1. da Veiga Leprevost F et al. Philosopher: a versatile toolkit for shotgun proteomics data analysis. Nature Methods 17, 869–870 (2020). - PMC - PubMed
    1. Yu F et al. Fast Quantitative Analysis of timsTOF PASEF Data with MSFragger and IonQuant. Molecular & Cellular Proteomics 19, 1575–1585 (2020). - PMC - PubMed

Publication types