Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jun 1:2024.05.27.596028.
doi: 10.1101/2024.05.27.596028.

A genome-wide association study of mass spectrometry proteomics using the Seer Proteograph platform

Affiliations

A genome-wide association study of mass spectrometry proteomics using the Seer Proteograph platform

Karsten Suhre et al. bioRxiv. .

Update in

Abstract

Genome-wide association studies (GWAS) with proteomics are essential tools for drug discovery. To date, most studies have used affinity proteomics platforms, which have limited discovery to protein panels covered by the available affinity binders. Furthermore, it is not clear to which extent protein epitope changing variants interfere with the detection of protein quantitative trait loci (pQTLs). Mass spectrometry-based (MS) proteomics can overcome some of these limitations. Here we report a GWAS using the MS-based Seer Proteograph platform with blood samples from a discovery cohort of 1,260 American participants and a replication in 325 individuals from Asia, with diverse ethnic backgrounds. We analysed 1,980 proteins quantified in at least 80% of the samples, out of 5,753 proteins quantified across the discovery cohort. We identified 252 and replicated 90 pQTLs, where 30 of the replicated pQTLs have not been reported before. We further investigated 200 of the strongest associated cis-pQTLs previously identified using the SOMAscan and the Olink platforms and found that up to one third of the affinity proteomics pQTLs may be affected by epitope effects, while another third were confirmed by MS proteomics to be consistent with the hypothesis that genetic variants induce changes in protein expression. The present study demonstrates the complementarity of the different proteomics approaches and reports pQTLs not accessible to affinity proteomics, suggesting that many more pQTLs remain to be discovered using MS-based platforms.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTERESTS STATEMENT H.G. and S.B. are employees and/or stockholders of Seer, Inc.; J.L-S. is a scientific advisor to Precion Inc. and TruDiagnostic; J.L-S. has a sponsored research agreement with TruDiagnostic. J.L-S. previously consulted for Cambrian and Ahara. The other authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Manhattan plot.
Shown are all protein associations that reached a significance level p < 5×10−8 in the discovery study.
Figure 2:
Figure 2:. Effect size and effect allele frequencies.
Scatterplot of the pQTL effect sizes from the discovery (Tarkin) and the replication (QMDiab) study, replicated loci are shown in red (A); Scatterplot of the effect allele frequencies (EAF), Tarkin versus QMDiab (B).
Figure 3:
Figure 3:. Visualization of the Proteograph MS-proteomics data for a prototypical pQTL.
Violin plots of log-scaled engine-normalized protein and peptide intensities by genotype for the indicated protein and genetic variant (bottom left box, details are in Table S2); Top row (green): Tarkin data, using PAV-exclusive library, Middle row (grey): QMDiab data, using PAV-exclusive library, Bottom row (red): Tarkin data, using PAV-inclusive library, limited to PAV containing peptides; Plot titles indicate the protein UniProt identifier and the pQTL type (cis/trans) above the Tarkin protein plot, and HGNC gene name and nanoparticle number are above the QMDiab protein plot; Titles of the peptide plots indicate the respective peptide sequences and precursor charges (pc), and additionally the PAV variant rsID and allele type (REF/ALT) for the PAV peptide plots; Summary statistics are reported below the plots based on linear models with residualized and inverse-normal scaled data for associations with the PAV-exclusive library data (top two rows) and Fisher’s exact test for PAV-inclusive library data (bottom row); Associations with peptide data are sorted by increasing p-values from left to right and limited to a maximum of four plots; Whenever peptides were detected at multiple precursor charge values, only the strongest association was plotted; Numbers at the x-axis tick marks indicate the number of detected peptides by genotype followed by the number a samples with the corresponding genotype (e.g. 662:665); Genotypes are ordered as (1) other allele, (2) heterozygote, (3) effect allele, where the effect allele is indicated following the SNP name (chr:pos:ref:alt_eff, e.g. 3:49721532:G:A_A). Similar plots for all 252 pQTLs are provided as Figure S2.
Figure 4:
Figure 4:. Scatterplot of the effect sizes for the protein associations with sex and age.
Summary statistics for the associations with affinity proteomics were from the respective GWAS studies. Associations that reached Bonferroni significance in both respective studies are in red and in one study are in blue (p < 0.05 / number of reported associations). The effect sizes (beta) are reported in units of standard deviations (s.d.). Data points outside the plotting window are indicated by diamonds on the plot frames. Plot data are available in Table ST4. Scatterplots are limited to 507 unique proteins that were reported by all three studies. In cases where data for multiple affinity binders was reported, the most significant association was retained.
Figure 5:
Figure 5:. MS-peptide association score plotted by pQTL rank.
Scatterplot of the MSPA scores against the rank of the affinity proteomics pQTLs of the deCODE SOMAscan (panel A, data in Table S7) and the UKB-PPP OLINK (panel B, data in Table S8) studies, ranked starting with the lowest p-value. The first 100 pQTLs (out of 322 pQTLs for SOMAscan and 374 pQTLs for Olink) are coloured to indicate likely protein expression QTLs (MSPA score > 0.8; green) and likely epitope effect driven pQTLs (MSPA score < 0.2; red), the sigmoid curve indicates the assumed dependence of power to detect a pQTL as a function of the strength of the association, approximated by the rank of the pQTL in the respective study; MSPA scores limited to 46 pQTLs that were reported on the same variant in deCODE (panel C) and UKBPPP (panel D); scatterplot of the effect size (beta) of the 46 pQTLs reported deCODE and UKBPPP (panel E, data in Table S9).
Figure 6:
Figure 6:
Scatterplot of the effect sizes (beta) of 46 pQTLs that were reported by deCODE and UKBPPP on the same SNP and for which association data was also available for Tarkin and QMDiab; Tarkin vs. QMDiab (A), Tarkin vs. deCODE (B), Tarkin vs. UKBPPP (C); associations that reach a significance level of p < 0.05 / 46 in Tarkin are in red (Table S9).
Figure 7:
Figure 7:. Example of a pQTL that is likely affected by an epitope effect.
See legend of Figure 3 for legend; similar plots for all 374 pQTLs with OLINK data and for all 322 pQTLs with SOMAscan data are provided as Figures S3 and S4; data is in Tables S6 and S7.

References

    1. Suhre K., McCarthy M.I., and Schwenk J.M. (2021). Genetics meets proteomics: perspectives for large population-based studies. Nature reviews. Genetics 22, 19–37. 10.1038/s41576-020-0268-2. - DOI - PubMed
    1. Plenge R.M., Scolnick E.M., and Altshuler D. (2013). Validating therapeutic targets through human genetics. Nature reviews. Drug discovery 12, 581–594. 10.1038/nrd4051. - DOI - PubMed
    1. Suhre K., Arnold M., Bhagwat A.M., Cotton R.J., Engelke R., Raffler J., Sarwath H., Thareja G., Wahl A., DeLisle R.K., et al. (2017). Connecting genetic risk to disease end points through the human blood plasma proteome. Nature communications 8, 14357. 10.1038/ncomms14357. - DOI - PMC - PubMed
    1. Sun B.B., Maranville J.C., Peters J.E., Stacey D., Staley J.R., Blackshaw J., Burgess S., Jiang T., Paige E., Surendran P., et al. (2018). Genomic atlas of the human plasma proteome. Nature 558, 73–79. 10.1038/s41586-018-0175-2. - DOI - PMC - PubMed
    1. Lourdusamy A., Newhouse S., Lunnon K., Proitsi P., Powell J., Hodges A., Nelson S.K., Stewart A., Williams S., Kloszewska I., et al. (2012). Identification of cis-regulatory variation influencing protein abundance levels in human plasma. Human molecular genetics 21, 3719–3726. 10.1093/hmg/dds186. - DOI - PMC - PubMed

Publication types