Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jan 29:2024.04.01.587474.
doi: 10.1101/2024.04.01.587474.

Multiplex, multimodal mapping of variant effects in secreted proteins

Affiliations

Multiplex, multimodal mapping of variant effects in secreted proteins

Nicholas A Popp et al. bioRxiv. .

Update in

Abstract

Despite widespread advances in DNA sequencing, the functional consequences of most genetic variants remain poorly understood. Multiplexed Assays of Variant Effect (MAVEs) can measure the function of variants at scale, and are beginning to address this problem. However, MAVEs cannot readily be applied to the ~10% of human genes encoding secreted proteins. We developed a flexible, scalable human cell surface display method, Multiplexed Surface Tethering of Extracellular Proteins (MultiSTEP), to measure secreted protein variant effects. We used MultiSTEP to study the consequences of missense variation in coagulation factor IX (FIX), a serine protease where genetic variation can cause hemophilia B. We combined MultiSTEP with a panel of antibodies to detect FIX secretion and post-translational modification, measuring a total of 44,816 effects for 436 synonymous variants and 8,528 of the 8,759 possible missense variants. 49.6% of possible F9 missense variants impacted secretion, post-translational modification, or both. We also identified functional constraints on secretion within the signal peptide and for nearly all variants that caused gain or loss of cysteine. Secretion scores correlated strongly with FIX levels in hemophilia B and revealed that loss of secretion variants are particularly likely to cause severe disease. Integration of the secretion and post-translational modification scores enabled reclassification of 63.1% of F9 variants of uncertain significance in the My Life, Our Future hemophilia genotyping project. Lastly, we showed that MultiSTEP can be applied to a wide variety of secreted proteins. Thus, MultiSTEP is a multiplexed, multimodal, and generalizable method for systematically assessing variant effects in secreted proteins at scale.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. MultiSTEP enables at-scale measurement of variant effects in secreted proteins.
a. Secreted proteins (purple) make up approximately 10% of the human proteome. b. Missense variants in secreted proteins found in ClinVar from 2016 to 2023 colored by clinical significance. c. MultiSTEP retains secreted proteins on the cell surface, establishing a physical link between genotype and phenotype (left panel). Cells expressing a library of variants of the target protein are sorted into bins based upon intensity of fluorescent antibody binding, followed by deep sequencing to derive a functional score for each individual variant (middle panels). The result is a variant effect map (right panel). d. MultiSTEP design. Secreted protein coding sequences (pink) are cloned into an attB-containing landing pad donor plasmid. Secreted proteins are engineered to have C-terminally fused (GGGGS)2 flexible linkers (L1 and L2, teal) attached to a single pass transmembrane domain (TMD, blue). In between the linkers is a strep II epitope tag for surface detection (green). The construct contains an IRES (purple) driving co-transcription of an mCherry fluorophore (red) that serves as a transcriptional control. e-g. Flow cytometry of known well-secreted (p.A37T, p.S220T, WT) and poorly-secreted (p.C28Y) FIX variants displayed using MultiSTEP (n ~30,000 cells per variant). Unrecombined cells do not display FIX and serve as a negative control. Fluorescent signal was generated by staining the library with either a mouse monoclonal anti-FIX heavy chain antibody (e), a mouse monoclonal anti-FIX light chain antibody (f), or a mouse monoclonal anti-strep II tag antibody (g), followed by staining with an Alexa Fluor-647-labeled goat anti-mouse secondary antibody.
Figure 2:
Figure 2:. 17,927 MultiSTEP-derived secretion scores for 8,964 factor IX variants.
a. Factor IX domain and chain architecture. Signal: Signal peptide. Pro: Propeptide. Gla: Gla domain. EGF1: Epidermal growth-like factor 1 domain. EGF2: Epidermal growth-like factor 2 domain. Activation: Activation peptide. Protease: Serine protease domain. b-c. Heatmaps showing FIX heavy chain secretion scores (b) or FIX light chain secretion scores (c) for nearly all missense FIX variants. Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased scores). Black dots indicate the WT amino acid. Missing data scores are colored gray. d-e. Density distributions of heavy chain (d) or light chain (e) secretion scores for FIX missense variants (orange) and synonymous variants (blue). Dashed line denotes the 5th percentile of the synonymous variant distribution. f-g. Scatter plots comparing MultiSTEP-derived heavy chain (f) or light chain (g) secretion scores for seven different FIX variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.C134R, p.S220T, and p.H267L), WT, and an unrecombined negative control to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry on cells expressing each variant individually. The p.E67K missense variant is not present in (g). h. Scatter plot of median MultiSTEP-derived heavy chain and light chain secretion scores at each position in FIX. Points are colored by chain architecture, using the same color scheme as (a). Black dashed line indicates the line of perfect correlation between secretion scores. Pearson’s correlation coefficient is shown. Gray background indicates <0.3 point deviation from perfect correlation. Points with median positional scores outside gray background are labeled with their corresponding FIX position. i. AlphaFold2 model of mature, two-chain FIX (positions 47–191 and 227–461). Positions labeled in (h) are shown as colored surfaces where color corresponds to the FIX heavy chain (purple) or light chain (green) putative epitope positions. j. Magnified view of FIX EGF1 domain in the light chain (orange). Putative epitope positions with discordant light and heavy chain antibody scores (h-i) are shown as a green colored surface with visible amino acids labeled. k. Magnified view of FIX serine protease domain in the heavy chain (yellow). Putative epitope positions (h-i) are shown as a purple colored surface with visible amino acids labeled.
Figure 3:
Figure 3:. MultiSTEP reveals biochemical constraints on secretion.
a. Predicted signal peptide regions for WT FIX from SignalP 6.0, indicated by color (top). Heatmap showing FIX heavy chain secretion scores for signal peptide variants (bottom). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased antibody scores). Black dots indicate the WT amino acid. Missing scores are gray. N: N-region; H: H-region; C: C-region. b. Secretion antibody scores for variants predicted by SignalP 6.0 to be either secreted or not secreted, throughout signal peptide and within sub-regions. Violin plot shows distribution of points with an inset box plot and horizontal lines representing the 25th, 50th, and 75th percentiles. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. Number of variants in each class is labeled above the violin plot. SP6: SignalP 6.0; N: N-region; H: H-region; C: C-region. c. FIX cysteine variants colored by domain architecture (top). Sig: Signal peptide. Gla: Gla domain. EGF1: Epidermal growth-like factor 1 domain. EGF2: Epidermal growth-like factor 2 domain. Protease: Serine protease domain. Disulfide bridges in WT FIX are denoted by black connecting lines (p.C18 and p.C28 do not form a disulfide bridge),–. Heatmap of FIX heavy chain secretion scores for variants at positions with a cysteine in the WT sequence, colored as in (b) (bottom). d. Mean variant effect scores for all loss-of-cysteine substitutions for different proteins. For FIX, the average secretion score determined by MultiSTEP is shown. For all other proteins, the average abundance score measured using VAMP-seq is shown,,,. Points indicate mean abundance or secretion score for all scored variants at all WT cysteine positions. Error bars show standard error of the mean. Asterisks indicate level of statistical significance as determined by Bonferroni-corrected pairwise two-sided t-test to FIX variant scores. **** = p <0.0001. e. Box plots of secretion scores for all missense variants across all positions with the indicated WT amino acid. f. Mean variant effect scores for all gain-of-cysteine substitutions for the same set of proteins as in (d). Points indicate mean abundance or secretion score for all gain-of-cysteine variants. Error bars show standard error of the mean. Asterisks indicate level of statistical significance as determined by Bonferroni-corrected pairwise two-sided t-test to FIX variant scores. * = p <0.05, **** = p <0.0001. g. Box plots of secretion scores for all missense substitutions of the indicated variant amino acid across all positions.
Figure 4:
Figure 4:. MultiSTEP enables measurement of variant effects on FIX post-translational modification.
a. Factor IX domain and chain architecture. Signal: Signal peptide. Pro: Propeptide. Gla: Gla domain. EGF1: Epidermal growth-like factor 1 domain. EGF2: Epidermal growth-like factor 2 domain. Activation: Activation peptide. Protease: Serine protease domain. b-c. Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (b) or carboxylation-sensitive Gla-motif carboxylation scores (c) for nearly all missense FIX variants. Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased antibody scores). Black dots indicate the WT amino acid. Missing data scores are colored gray. Furin cleavage site (F), ω-loop (ω), ExxxExC motif (E), and aromatic stack (AS) are annotated above (b) and (c). For zoomed-in heatmaps on the propeptide and Gla domains of FIX, please refer to Supplementary Fig. 9d–i. d-e. Density distributions of carboxylation-sensitive FIX-specific (d) or carboxylation-sensitive Gla-motif (e) carboxylation scores for FIX missense variants (orange) and synonymous variants (blue). Dashed line denotes the 5th percentile of the synonymous variant distribution. f. Scatter plot of median MultiSTEP-derived carboxylation-sensitive FIX-specific carboxylation scores and light chain secretion scores at each position in FIX. Points are colored by domain architecture, using the same color scheme as a. Black dashed line indicates >0.2 point deviation threshold from perfect correlation between carboxylation and secretion scores. Points with deviation greater than this threshold are labeled with their corresponding FIX position. Pearson’s correlation coefficient is shown. g. Crystal structure of FIX Gla domain (positions 47–92). Disulfide bridges and γ-carboxylated glutamates are shown as sticks. Calcium ions are shown as teal spheres. Structure is colored according to the ratio of median positional carboxylation-sensitive FIX-specific carboxylation score to median positional FIX light chain secretion score. Missing positions are colored gray. Disulfide bridge side chains are colored yellow.
Figure 5:
Figure 5:. Secretion and gamma-carboxylation scores reveal clinical features of hemophilia B and enable variant reinterpretation.
a. Scatter plot of light chain secretion scores and FIX plasma antigen from individuals with hemophilia B in the EAHAD database. Horizontal solid lines indicate standard error of the mean for light chain secretion scores. Vertical solid lines indicate standard error of the mean for FIX plasma antigen levels across individuals with hemophilia B harboring the same variant. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. LOESS line of best fit depicted in red with 95% confidence interval shaded in gray. b. Comparison of EAHAD individual hemophilia B severity with light chain secretion scores from MultiSTEP. Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. To compare median secretion scores across disease severities, a Kruskal–Wallis test adjusted for multiple comparisons by post-hoc Dunn’s test was performed. Asterisks indicate the level of statistical significance. ns = p >0.05. **** = p <0.0001. c. Severe hemophilia B disease-associated variants with WT-like light chain-derived secretion scores or FIX-specific γ-carboxylation antibody scores is shown. Bars are colored according to the number of variants located in the indicated domain. d. Comparison of EAHAD individual hemophilia B severity with EAHAD FIX plasma antigen levels. Dashed horizontal line is 40% FIX plasma antigen. To compare median FIX plasma antigen levels across disease severities, a Kruskal-Wallis test followed by post-hoc Bonferroni-corrected Dunn’s test was performed. Asterisks indicate the level of statistical significance. ns = p >0.05. **** = p <0.0001. e. Histograms of multiplexed functional scores for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF. Color indicates clinical variant interpretation. Data from four antibodies are shown. Dashed vertical line indicates the 5th percentile of synonymous variants used as a threshold for abnormal function. f. Receiver-operator curve for random forest classifier for identifying abnormal function FIX alleles from MultiSTEP scores. Dot indicates final classifier performance metrics. g. Histogram depicting F9 missense variants found in the gnomAD 4.1 database according to minor allele frequencies (MAF) in hemizygotes. Color indicates functional classification based on random forest model using MultiSTEP functional scores. Vertical dashed line indicates estimated prevalence for hemophilia B in hemizygous individuals. h. Sankey diagram of F9 variant reinterpretation using moderate and strong levels of evidence for functional data. Labeled nodes represent the number of variants of each class.
Figure 6:
Figure 6:. MultiSTEP can be applied to diverse secreted proteins.
a. Flow cytometry of various protein and control constructs in the MultiSTEP backbone (n ~30,000 cells each). Unrecombined cells do not display FIX and serve as a negative control. All other constructs contain the MultiSTEP flexible linker, strep II tag, and transmembrane domain. Δstart is a FIX cDNA that does not contain a start codon. TM only does not contain a secreted protein of interest. FIX Δsignal peptide expresses a FIX molecule without its secretion-targeting signal peptide. Fluorescent signal was generated by staining the library with a rabbit polyclonal anti-strep II tag antibody followed by staining with an Alexa Fluor-488-labeled donkey anti-rabbit secondary antibody. b. Flow cytometry of B-domain deleted coagulation factor VIII (FVIII) in the MultiSTEP backbone or unrecombined negative control cells (NC) (n ~30,000 cells each). Fluorescent signal was generated by staining the library with a mouse monoclonal anti-FVIII A1-A3 antibody, which targets the discontinuous epitope at the interface of the A1 and A3 domains, followed by staining with an Alexa Fluor-647-labeled goat anti-mouse secondary antibody. c. Flow cytometry of B-domain deleted coagulation factor VIII (FVIII) in the MultiSTEP backbone or unrecombined negative control cells (NC) (n ~30,000 cells each). Fluorescent signal was generated by staining the library with a mouse monoclonal anti-FVIII A2 antibody, which targets a discontinuous epitope (positions 497–510 and 584–593) within the A2 domain, followed by staining with an Alexa Fluor-647-labeled goat anti-mouse secondary antibody. . d-h. Flow cytometry of B-domain deleted coagulation factor VIII (FVIII) and 5 FVIII variants in the MultiSTEP backbone along with unrecombined negative control cells (NC) (n = ~10,000 cells each). Fluorescent signal was generated by staining cells with mouse monoclonal anti-FVIII antibodies specific to the A1 (d), A2 (e), light chain (f), C1 (g), or C2 (h) domains, followed by staining with an Alexa Fluor-647-labeled goat anti-mouse secondary antibody. i-m. Flow cytometry of coagulation factor VII (i), coagulation factor X (j), proinsulin (k), plasma protease C1 inhibitor (l), and alpha-1 antitrypsin (m) constructs in the MultiSTEP backbone along with unrecombined negative control (NC) (n ~10,000 cells each). For each secreted protein, at least one variant with clinical or in vitro evidence of decreased secretion is included. Fluorescent signal was generated by staining the library with a rabbit polyclonal anti-strep II tag antibody followed by staining with an Alexa Fluor-488-labeled donkey anti-rabbit secondary antibody.

References

    1. Landrum M. J. et al. ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014). - PMC - PubMed
    1. MacArthur D. G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014). - PMC - PubMed
    1. Karczewski K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). - PMC - PubMed
    1. Richards S. et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015). - PMC - PubMed
    1. Starita L. M. et al. Variant interpretation: Functional assays to the rescue. Am. J. Hum. Genet. 101, 315–325 (2017). - PMC - PubMed

Publication types

LinkOut - more resources