Comparative Study

. 2024 Nov 22;26(1):bbae657.

doi: 10.1093/bib/bbae657.

Synthetic plasma pool cohort correction for affinity-based proteomics datasets allows multiple study comparison

Dries Heylen^{1

2}, Murih Pusparum^{2

3}, Jurgis Kuliesius⁴, Jim Wilson^{4

5}, Young-Chan Park⁶, Jacek Jamiołkowski⁷, Valentino D'Onofrio⁸, Dirk Valkenborg³, Jan Aerts⁹, Gökhan Ertaylan², Jef Hooyberghs¹

Affiliations

¹ Data Science Institute, Theory Lab, Hasselt University, 3590 Diepenbeek, Belgium.
² Flemish Institute for Technological Research (VITO), Mol, Belgium.
³ Hasselt University, Data Science Institute, 3590 Diepenbeek, Belgium.
⁴ Centre for Global Health Research, University of Edinburgh, Edinburgh BioQuarter, Edinburgh EH16 4UX, United Kingdom.
⁵ MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, United Kingdom.
⁶ Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.
⁷ Department of Population Medicine and Lifestyle Diseases Prevention, Medical University of Bialystok, 15-089 Białystok, Poland.
⁸ Center for Vaccinology, Ghent University and Ghent University Hospital, 9000 Ghent, Belgium.
⁹ Augmented Intelligence for Data Analytics (AIDA) Lab Department of Biosystems KU Leuven, Leuven, Belgium.

PMID: 39694815
PMCID: PMC11653412
DOI: 10.1093/bib/bbae657

Comparative Study

Synthetic plasma pool cohort correction for affinity-based proteomics datasets allows multiple study comparison

Dries Heylen et al. Brief Bioinform. 2024.

. 2024 Nov 22;26(1):bbae657.

doi: 10.1093/bib/bbae657.

Authors

Affiliations

¹ Data Science Institute, Theory Lab, Hasselt University, 3590 Diepenbeek, Belgium.
² Flemish Institute for Technological Research (VITO), Mol, Belgium.
³ Hasselt University, Data Science Institute, 3590 Diepenbeek, Belgium.
⁴ Centre for Global Health Research, University of Edinburgh, Edinburgh BioQuarter, Edinburgh EH16 4UX, United Kingdom.
⁵ MRC Human Genetics Unit, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, United Kingdom.
⁶ Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany.
⁷ Department of Population Medicine and Lifestyle Diseases Prevention, Medical University of Bialystok, 15-089 Białystok, Poland.
⁸ Center for Vaccinology, Ghent University and Ghent University Hospital, 9000 Ghent, Belgium.
⁹ Augmented Intelligence for Data Analytics (AIDA) Lab Department of Biosystems KU Leuven, Leuven, Belgium.

PMID: 39694815
PMCID: PMC11653412
DOI: 10.1093/bib/bbae657

Erratum in

Correction to: Synthetic plasma pool cohort correction for affinity-based proteomics datasets allows multiple study comparison.
[No authors listed] [No authors listed] Brief Bioinform. 2024 Nov 22;26(1):bbaf112. doi: 10.1093/bib/bbaf112. Brief Bioinform. 2024. PMID: 40036723 Free PMC article. No abstract available.

Abstract

Proteomics stands as the crucial link between genomics and human diseases. Quantitative proteomics provides detailed insights into protein levels, enabling differentiation between distinct phenotypes. OLINK, a biotechnology company from Uppsala, Sweden, offers a targeted, affinity-based protein measurement method called Target 96, which has become prominent in the field of proteomics. The SCALLOP consortium, for instance, contains data from over 70.000 individuals across 45 independent cohort studies, all sampled by OLINK. However, when independent cohorts want to collaborate and quantitatively compare their target 96 protein values, it is currently advised to include 'identical biological bridging' samples in each sampling run to perform a reference sample normalization, correcting technical variations across measurements. Such a 'biological bridging sample' approach requires each of the involved cohorts to resend their biological bridging samples to OLINK to run them all together, which is logistically challenging, costly and time-consuming. Hence alternatives are searched and an evaluation of the current state of the art exposes the need for a more robust method that allows all OLINK Target 96 studies to compare proteomics data accurately and cost-efficiently. To meet these goals we developed the Synthetic Plasma Pool Cohort Correction, the 'SPOC correction' approach, based on the use of an OLINK-composed synthetic plasma sample. The method can easily be implemented in a federated data-sharing context which is illustrated on a sepsis use case.

Keywords: biomarkers; normalization; protein quantification; proteomics.

PubMed Disclaimer

Figures

**Figure 1**
Schematic overview of OLINK’s sampling procedure. The well plate configuration is illustrative, and in practice the sample positions can be random.

**Figure 2**
Evaluation of the rank-based, inverse normalized transformation (INT) applied to IAF cohort data. Rank switches throughout timepoints of measurement are shown for OLINK inflammation panel data for all 30 IAF individuals. Six consecutive protein collection periods are used (5 transitions). Ranks for each protein are assigned by ordering each individual’s protein NPX value from high to low. Each individual’s rank for each protein is evaluated against its rank from the previous time point. A rank switch for a protein is considered present if a person is at least 10% higher or lower up the ranking when comparing the position of an individual over two consecutive months. Hierarchical clustering is applied to the rank switch data to determine the order of the columns (proteins).

**Figure 3**
Default reference sample normalization *versus* SPOC correction procedure. (A) Default reference sample normalization of OLINK allows quantitative comparison of protein NPX values but requires cohorts to include at least 8 of their samples as biological bridging samples on each plate of the cohort. (B) The SPOC correction allows universal collaboration across OLINK cohorts using external sample controls consisting of a synthetic plasma pool.

**Figure 4**
Comparison of the SPOC correction terms with the reference sample correction terms . Each dot represents a correction value for a specific protein. (A-B) Correction values calculated with the reference sample correction method plotted on the x-axis against the correction values calculated with the SPOC correction method on the y-axis. (C-D) Bland–Altman plots to analyze the agreement between the two correction methods. The horizontal midle line indicates the average difference between the SPOC correction term and the reference sample correction term. Upper and lower 95% confidence intervals are indicated by the dotted lines. Left, for the inflammation protein panel from OLINK in the FAPIC cohort (92 proteins). Right, for all available protein panels from OLINK in the IAF cohort (1068 proteins). The samples that were run for IAF on the plates in batch 2 and batch 4 are used for the IAF plots. *with null-hypothesis H₀ = independent variables (i.e. SPOC correction terms) in the regression model explain the variability of the dependent variable (i.e. reference sample correction terms) in a random way. Based on this evaluation we consider the SPOC correction as a valid normalization that can be used when identical biological bridging samples are not present across different cohorts.

formula image — **Figure 4**
Comparison of the SPOC correction terms with the reference sample correction terms . Each dot represents a correction value for a specific protein. (A-B) Correction values calculated with the reference sample correction method plotted on the x-axis against the correction values calculated with the SPOC correction method on the y-axis. (C-D) Bland–Altman plots to analyze the agreement between the two correction methods. The horizontal midle line indicates the average difference between the SPOC correction term and the reference sample correction term. Upper and lower 95% confidence intervals are indicated by the dotted lines. Left, for the inflammation protein panel from OLINK in the FAPIC cohort (92 proteins). Right, for all available protein panels from OLINK in the IAF cohort (1068 proteins). The samples that were run for IAF on the plates in batch 2 and batch 4 are used for the IAF plots. *with null-hypothesis H₀ = independent variables (i.e. SPOC correction terms) in the regression model explain the variability of the dependent variable (i.e. reference sample correction terms) in a random way. Based on this evaluation we consider the SPOC correction as a valid normalization that can be used when identical biological bridging samples are not present across different cohorts.

**Figure 5**
Landscape of OLINK proteomic studies sampled with a qPCR target 96 approach. A technical measurement variation can be bridged with a biological bridging sample correction as long as these are available across all plates of the different cohorts or across sampling timepoints. If this is not the case (see cross sign) synthetic plasma samples correction () can be used if their plasma sample is from the same synthetic plasma pool. If a different plasma pool is included a pool effect correction is also needed.

**Figure 6**
Unsupervised hierarchical clustering with a Euclidean distance measure and Mcquitty clustering method. Clustering is performed on protein NPX values for the target 96 metabolism panel of external sample controls. Two separate horizontal color coded bars indicate the cohort and plasma pool origin for each external sample control. (A) Clustering is shown for all external sample controls from four different cohorts involved in the SCALLOP consortium. Complete separation of the old *versus* new plasma pool is visible. (B) Reference sample correction applied on the IAF samples by computing the correction values with identical biological bridging samples. (C) SPOC correction applied on the IAF samples by computing the SPOC correction values with external sample controls. Complete clustering plots, including protein expression values, are attached as supplementary figures (Fig. S1–S3).

**Figure 7**
Federated transfer of protein ranges to compare protein NPX values between cohorts with distinct phenotypes (IAF-FAPIC). Green error bars show healthy NPX reference intervals based on the IAF data. The red data points show the difference in protein NPX value between the median of the IAF cohort and the median of the FAPIC cohort. The orange error bars represent a technical measurement error margin. To optimize the comparability across proteins and focus on the difference between both cohorts, the green reference intervals are centered with the median NPX value’s of the IAF cohort. Intervals that do not overlap indicate a significant difference between the two phenotypes for the relevant protein.

See this image and copyright information in PMC

References

1. Correa Rojo A, Heylen D, Aerts J. et al. Towards building a quantitative proteomics toolbox in precision medicine: a mini-review. Front Physiol 2021;12:1394. 10.3389/fphys.2021.723510. - DOI - PMC - PubMed
1. Mesri M. Advances in proteomic technologies and its contribution to the field of cancer. Adv Med 2014;2014:1–25.Available from: /pmc/articles/PMC4590950/. 10.1155/2014/238045. - DOI - PMC - PubMed
1. Liu X, Luo X, Jiang C. et al. Difficulties and challenges in the development of precision medicine. Clin Genet 2019;95:569–74. 10.1111/cge.13511. - DOI - PubMed
1. Proximity T, Assay E, Ab OP. et al. White Paper PEA – A High-Multiplex Immunoassay Technology with qPCR or NGS Readout. Olink proteomics AB. https://olink.com/.
1. Wik L, Nordberg N, Broberg J. et al. Proximity extension assay in combination with next-generation sequencing for high-throughput proteome-wide analysis. Mol Cell Proteomics 2021;20:100168. 10.1016/j.mcpro.2021.100168. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

BOF20OWB29/Hasselt University BOF

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Synthetic plasma pool cohort correction for affinity-based proteomics datasets allows multiple study comparison

Affiliations

Synthetic plasma pool cohort correction for affinity-based proteomics datasets allows multiple study comparison

Authors

Affiliations

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources