Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 2;15(1):989.
doi: 10.1038/s41467-024-45233-y.

Nanoparticle enrichment mass-spectrometry proteomics identifies protein-altering variants for precise pQTL mapping

Affiliations

Nanoparticle enrichment mass-spectrometry proteomics identifies protein-altering variants for precise pQTL mapping

Karsten Suhre et al. Nat Commun. .

Abstract

Proteogenomics studies generate hypotheses on protein function and provide genetic evidence for drug target prioritization. Most previous work has been conducted using affinity-based proteomics approaches. These technologies face challenges, such as uncertainty regarding target identity, non-specific binding, and handling of variants that affect epitope affinity binding. Mass spectrometry-based proteomics can overcome some of these challenges. Here we report a pQTL study using the Proteograph™ Product Suite workflow (Seer, Inc.) where we quantify over 18,000 unique peptides from nearly 3000 proteins in more than 320 blood samples from a multi-ethnic cohort in a bottom-up, peptide-centric, mass spectrometry-based proteomics approach. We identify 184 protein-altering variants in 137 genes that are significantly associated with their corresponding variant peptides, confirming target specificity of co-associated affinity binders, identifying putatively causal cis-encoded proteins and providing experimental evidence for their presence in blood, including proteins that may be inaccessible to affinity-based proteomics.

PubMed Disclaimer

Conflict of interest statement

G.R.V., H.G., M.D., K.M., A.S., and S.B. are employees and/or stockholders of Seer, Inc.; The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Study design and workflow.
Procedure used to incorporate QMDiab variants into the UniProt.fasta file, create spectral libraries, and identify MS-PAVs and MS-pQTLs, plus an overview of the overall study design. Part of the Figure has been created with BioRender.com.
Fig. 2
Fig. 2. Proteins and peptides detected in > 20% of the samples by the Proteograph™ workflow.
Data is for protein and peptide identification using DIA-NN with the reference library using the match-between-runs (MBR) option (see Supplementary Fig. 1 for the dependence between number of detections and % missingness).
Fig. 3
Fig. 3. Boxplots by genotype rs4524 for selected Factor V (F5) protein and peptide intensities.
This figure shows the effect of using the different libraries at the example of the Factor V (F5) protein. Similar plots are provided in Supplementary Fig. 3 and as Source Data with this paper for all 184 MS-PAVs; The boxes are color-coded as following: using the PAV-exclusive library (green), using the reference library (blue), and using the PAV-inclusive library (red). Protein intensities are in dark colors, and peptide intensities are in light colors. The grey horizontal boxplots on top of the plots represent the range of the data shown in that plot compared to the 5–95% range of the entire data for that protein. Units on the y-axis are engine-normalized intensities as provided by DIA-NN. The x-axis labels indicate the number of detected peptides/proteins followed by a colon and the number of samples with the given genotype (order: reference/major allele homozygote, heterozygote, alternate/minor allele homozygote). The first line of the subtitle identifies the protein (Uniprot ID and rsID, when applicable) or the peptide sequence followed by the nanoparticle used in that analysis. The second line shows the number of data points included in generating the plot (N). Significance levels (p-values) for the following hypothesis tests are given: (1) Fisher’s Exact test on detected/non-detected versus presence/absence of the major (p-maj) or minor (p-min) allele, where the stronger of the two associations is shown (indicating MS-PAV detection significance), and (2) a linear regression of peptide intensity versus genotype (coded 0-1-2) with missing values set to zero (pX), and for proteins a linear model including relevant covariates using inverse-normal scaled protein intensities (excluding missing values) against genotype (pY; indicating pQTL significance). Protein name, chromosome, chromosome position (GRCh37), and major and minor alleles are indicated in boldface on top of the boxplots.
Fig. 4
Fig. 4. Scatterplot of the protein-level associations (p-values) for the 184 MS-PAVs using the reference and the PAV-exclusive libraries.
Three regimens are labeled: (1) variants that remain associated with protein levels after removal of the variant peptides from the library (MS-pQTLs), (2) variants where the association signal with the protein levels disappears after removal of the variant peptides (the MS equivalent of an epitope effect), and (3) variants that do not associate with protein levels in either case (MS-PAVs that may become significant in more highly powered studies). Plot data is in Supplementary Data 3. P-values (unadjusted) are from linear model as described in the methods section.
Fig. 5
Fig. 5. Regional association plots for the APOB region.
Association of the detection of the alternate variant peptide TSQCILK of APOB (pc2, nanoparticle 1) with the presence/absence of the matching genetic variants at the APOB locus (top), GWAS associations of Apolipoprotein B (middle) and LDL-cholesterol (bottom) measured by clinical biochemistry methods in blood samples from 343,621 participants of the UK Biobank study. The highlighted variant rs1367117 (chr2:21263900) is the MS-PAV in TSQC[T/I]LK. Linkage (LD) between variants is indicated color, gene positions are below. P-values (unadjusted) are from linear models generated by the respective studies.

References

    1. Suhre K, McCarthy MI, Schwenk JM. Genetics meets proteomics: perspectives for large population-based studies. Nat. Rev. Genet. 2021;22:19–37. doi: 10.1038/s41576-020-0268-2. - DOI - PubMed
    1. Suhre K, et al. Connecting genetic risk to disease endpoints through the human blood plasma proteome. Nat. Commun. 2017;8:14357. doi: 10.1038/ncomms14357. - DOI - PMC - PubMed
    1. Sun BB, et al. Genomic atlas of the human plasma proteome. Nature. 2018;558:73–79. doi: 10.1038/s41586-018-0175-2. - DOI - PMC - PubMed
    1. Folkersen L, et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat. Metab. 2020;2:1135–1148. doi: 10.1038/s42255-020-00287-2. - DOI - PMC - PubMed
    1. Thareja, G. et al. Differences and commonalities in the genetic architecture of protein quantitative trait loci in European and Arab populations. Hum. Mol. Genet.32, 907–916 (2023). - PMC - PubMed