Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul;22(7):100581.
doi: 10.1016/j.mcpro.2023.100581. Epub 2023 May 22.

Accurate Label-Free Quantification by directLFQ to Compare Unlimited Numbers of Proteomes

Affiliations

Accurate Label-Free Quantification by directLFQ to Compare Unlimited Numbers of Proteomes

Constantin Ammar et al. Mol Cell Proteomics. 2023 Jul.

Abstract

Recent advances in mass spectrometry-based proteomics enable the acquisition of increasingly large datasets within relatively short times, which exposes bottlenecks in the bioinformatics pipeline. Although peptide identification is already scalable, most label-free quantification (LFQ) algorithms scale quadratic or cubic with the sample numbers, which may even preclude the analysis of large-scale data. Here we introduce directLFQ, a ratio-based approach for sample normalization and the calculation of protein intensities. It estimates quantities via aligning samples and ion traces by shifting them on top of each other in logarithmic space. Importantly, directLFQ scales linearly with the number of samples, allowing analyses of large studies to finish in minutes instead of days or months. We quantify 10,000 proteomes in 10 min and 100,000 proteomes in less than 2 h, a 1000-fold faster than some implementations of the popular LFQ algorithm MaxLFQ. In-depth characterization of directLFQ reveals excellent normalization properties and benchmark results, comparing favorably to MaxLFQ for both data-dependent acquisition and data-independent acquisition. In addition, directLFQ provides normalized peptide intensity estimates for peptide-level comparisons. It is an important part of an overall quantitative proteomic pipeline that also needs to include high sensitive statistical analysis leading to proteoform resolution. Available as an open-source Python package and a graphical user interface with a one-click installer, it can be used in the AlphaPept ecosystem as well as downstream of most common computational proteomics pipelines.

Keywords: algorithms; label-free; protein intensity; proteomics; quantification.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare no competing interests.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
The directLFQ approach. Objects to be normalized are intensity traces that can be shifted. A, between-sample normalization, where each trace represents a sample and each element of the trace is a peptide's log2 intensity. Traces are shifted on top of each other (blue) as described below. B, protein intensity estimation, where each trace belongs to a peptide and each element of the trace is a sample’s log2 intensity. C, the shifting process. Traces are compared in a pairwise fashion by subtracting the intensities and extracting the median and variance of the resulting difference distribution (top). The most similar samples are shifted (indicated in blue) and a merged sample is created. The process is repeated on a now smaller similarity matrix until all traces are shifted (bottom). A more realistic example for between-sample normalization is given in (D) and for subsequent protein intensity estimation in (E).
Fig. 2
Fig. 2
Processing times for different methods and sample sizes. directLFQ scales (sub-) linearly for (A) data-dependent acquisition (DDA) data and (B) data-independent acquisition (DIA) data, resulting in more than 1000-fold faster execution times than MaxLFQ. X in the plot marks instances with prohibitive calculation times. ∗Extrapolated times, as the templates for the 100,000 sample set had to be shortened to adhere to 128 GB memory. See Experimental procedures and discussion section for details.
Fig. 3
Fig. 3
Applying directLFQ to different benchmarking datasets.A, mixed-species data-dependent acquisition (DDA) dataset processed with directLFQ and MaxLFQ. E. coli proteins should align along a log2 ratio of −2.59 (blue line), and H. sapiens proteins should align along a log2 ratio of 0. Median values are indicated by white dots in the violin plots, standard deviations are indicated in the respective color. B, distribution of coefficient of variation (CV) values on a DDA dataset with 200 replicate HeLa samples, with very similar results for directLFQ and MaxLFQ. C, mixed-species data-independent acquisition (DIA) dataset processed with directLFQ and two MaxLFQ implementations (iq and Spectronaut). Expected log2 ratios for S. cerevisiae, H. sapiens, and C. elegans proteins are −0.38, 0, and 1, respectively. D, distribution of CV values between technical repeat samples from a ∼900-sample clinical DIA dataset, processed with directLFQ and two MaxLFQ implementations (iq and DIA-NN) with comparable results for all approaches. E, testing directLFQ precursor normalization on a challenging tissue dataset and comparing against standard median normalization. After normalization, all boxes should be aligned around 0, which is the case for directLFQ but not for the median normalization approach.
Fig. 4
Fig. 4
Applying directLFQ to dynamic organellar maps data from ref. (28) and comparing with MaxLFQ.A, principal component analysis maps of the dynamic organellar maps data in which protein clusters are color coded. Several clusters such as Golgi and Mitochondrion are separated more clearly with directLFQ. B, quantitative assessment of the similarity of the intensity profiles of protein clusters (lower distance means better consistency). On the left, the distances with error bars are displayed for each tested protein cluster. The arrows indicate the two clusters minichromosome maintenance (MCM) complex and Proteasome where directLFQ and MaxLFQ perform best, respectively. On the right, the normalized distances are compared with each other as boxplots; directLFQ has significantly lower distance (p = 0.014, two-sided t test). C, protein intensity profiles of these two clusters. One outlier trace in each cluster is marked by an arrow. D, visualization of the protein profiles over all replicates together with the underlying ion data. The traces show that directLFQ faithfully represents the underlying data.

References

    1. Aebersold R., Mann M. Mass-spectrometric exploration of proteome structure and function. Nature. 2016;537:347–355. - PubMed
    1. Niu L., Thiele M., Geyer P.E., Rasmussen D.N., Webel H.E., Santos A., et al. Noninvasive proteomic biomarkers for alcohol-related liver disease. Nat. Med. 2022;28:1277–1287. - PMC - PubMed
    1. Bader J.M., Geyer P.E., Müller J.B., Strauss M.T., Koch M., Leypoldt F., et al. Proteome profiling in cerebrospinal fluid reveals novel biomarkers of Alzheimer’s disease. Mol. Syst. Biol. 2020;16 - PMC - PubMed
    1. Brunner A., Thielert M., Vasilopoulou C., Ammar C., Coscia F., Mund A., et al. Ultra-high sensitivity mass spectrometry quantifies single-cell proteome changes upon perturbation. Mol. Syst. Biol. 2022;18 - PMC - PubMed
    1. Specht H., Emmott E., Petelski A.A., Huffman R.G., Perlman D.H., Serra M., et al. Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity using SCoPE2. Genome Biol. 2021;22:50. - PMC - PubMed

Publication types