Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Jul 5:2023.06.19.23291621.
doi: 10.1101/2023.06.19.23291621.

GRPa-PRS: A risk stratification method to identify genetically-regulated pathways in polygenic diseases

Affiliations

GRPa-PRS: A risk stratification method to identify genetically-regulated pathways in polygenic diseases

Xiaoyang Li et al. medRxiv. .

Abstract

Background: Polygenic risk scores (PRS) are tools used to evaluate an individual's susceptibility to polygenic diseases based on their genetic profile. A considerable proportion of people carry a high genetic risk but evade the disease. On the other hand, some individuals with a low risk of eventually developing the disease. We hypothesized that unknown counterfactors might be involved in reversing the PRS prediction, which might provide new insights into the pathogenesis, prevention, and early intervention of diseases.

Methods: We built a novel computational framework to identify genetically-regulated pathways (GRPas) using PRS-based stratification for each cohort. We curated two AD cohorts with genotyping data; the discovery (disc) and the replication (rep) datasets include 2722 and 2854 individuals, respectively. First, we calculated the optimized PRS model based on the three recent AD GWAS summary statistics for each cohort. Then, we stratified the individuals by their PRS and clinical diagnosis into six biologically meaningful PRS strata, such as AD cases with low/high risk and cognitively normal (CN) with low/high risk. Lastly, we imputed individual genetically-regulated expression (GReX) and identified differential GReX and GRPas between risk strata using gene-set enrichment and variational analyses in two models, with and without APOE effects. An orthogonality test was further conducted to verify those GRPas are independent of PRS risk. To verify the generalizability of other polygenic diseases, we further applied a default model of GRPa-PRS for schizophrenia (SCZ).

Results: For each stratum, we conducted the same procedures in both the disc and rep datasets for comparison. In AD, we identified several well-known AD-related pathways, including amyloid-beta clearance, tau protein binding, and astrocyte response to oxidative stress. Additionally, we discovered resilience-related GRPs that are orthogonal to AD PRS, such as the calcium signaling pathway and divalent inorganic cation homeostasis. In SCZ, pathways related to mitochondrial function and muscle development were highlighted. Finally, our GRPa-PRS method identified more consistent differential pathways compared to another variant-based pathway PRS method.

Conclusions: We developed a framework, GRPa-PRS, to systematically explore the differential GReX and GRPas among individuals stratified by their estimated PRS. The GReX-level comparison among those strata unveiled new insights into the pathways associated with disease risk and resilience. Our framework is extendable to other polygenic complex diseases.

Keywords: Alzheimer’s disease; Schizophrenia; genetically-regulated expression; genetically-regulated pathway; orthogonal effect; polygenic risk score; resilience.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. GRPa-PRS workflow and study design.
The blue tabulates in the top row indicate the input data from preprocessed individual genotyping data, AD and SCZ GWAS summary statistics, and curated gene sets. The green tabulates and background include the key steps of our GRPa-PRS framework: 1. PRS strata strategy; 2. GReX imputation; 3.a GRPa-MAGMA approach and 3.b GRPa-GSVA; 4.a Differential GRPas summary and 4.b Orthogonality test. The dark blue tabulate includes the method PRSet we used to benchmark the performance of differential GRPa. The dashed line indicates the comparison between three approaches, PRSet, GRPa-MAGMA, and GRPa-GSVA. The orange tabulates are designed to explore the differential GRPa from three different approaches and their orthogonality. The highlighted two strata in Step 1: The resilience stratum (high-risk controls) and the extra-burden stratum (low-risk cases) are defined in Table 3.
Fig. 2.
Fig. 2.. Illustration of six strata comparisons and genetic factor distribution in GRPas.
(A) Stratify individuals based on PRS to evaluate underlying differential GRPas as shown in Table 3: (1) case_control, (2) TB20all, (3) TB20AD, (4) TB20Ctr, (5) T20, (6) B20. (B) Illustration of the distribution of genetic factors among individuals across different risk strata is shown in Table 3. B-Ctr indicates bottom percentile controls carrying no effective risk GRPa. B-AD indicates bottom percentile AD carrying effective risk GRPa. T-Ctr* indicates extreme percentile of controls carrying the risk factors that are sporadically distributed and have no effective risk GRPa; T-Ctr** indicates extreme percentile of controls carrying both effective risk and resilience-related GRPa. T-AD* indicates extreme percentile of cases carrying the risk factors gathered in effective risk GRPa. T-AD** indicates extreme percentile of cases carrying the risk factors gathered in another effective risk GRPa.
Fig. 3.
Fig. 3.. Enrichment of AD GRPa-MAGMA results on GO curation.
GRPa was identified by GRPa-MAGMA under Model 1, the model using full genotype to detect all pathways associated with strata comparison, and Model 2, the model using excluding APOE region genotype to detect pathways associated with six strata comparisons and independent from APOE effect. (A) GO GRPa identified in discovery (disc) dataset Model 1, (B) GO GRPa identified in replication (rep) dataset Model 1, (C) GO GRPa identified in disc dataset Model 2 (no-APOE model), and (D) GO GRPa identified in disc dataset Model 2 (no-APOE model), no significant result (FDR < 0.05) in this condition. * indicates the significant GRPas FDR < 0.05. Heatmap intensity indicates −log10(FDR). The x-axis shows the heatmap list of the subgroup comparison based on different GWAS summary statistics. S represents Schwartzentruber et al; K represents Kunkle et al; W represents Wightman et al.. (E) & (F) show the semantic similarity for significant terms from GRPa-MAGMA in BP and MF, respectively. The UpSet plot for overlapping signals between the strata among the disc cohort and the rep cohort under Model 1 was shown in (G).
Fig. 4.
Fig. 4.. AD GRPa-GSVA results enrich GO pathways.
GRPas identified by GRPa-GSVA under Model 1, the model using full genotype to detect all pathways associated with strata comparison, and Model 2, the model using excluding APOE region genotype to detect pathways associated with six strata comparisons and independent from APOE effect. (A) GO GRPa identified in discovery (disc) dataset Model 1, (B) GO GRPa identified in replication (rep) dataset Model 1, (C) GO GRPa identified in disc dataset Model 2 (no-APOE), and (D) GO GRPa identified in disc dataset Model 2 (no-APOE). * indicates the significant (p-value < 0.05 / # of gene set) GRPas identified in this condition. Heatmap intensity indicates −log10(p-value). The x-axis shows the heatmap list of the subgroup comparison based on different GWAS summary statistics. S represents Schwartzentruber et al; K represents Kunkle et al; W represents Wightman et al.. (E) & (F) show the semantic similarity for significant terms from GRPa-GSVA in BP and MF, respectively. The UpSet plot for overlapping signals between the disc cohort and rep cohort under Model 1 was shown in (G).
Fig. 5.
Fig. 5.. Compare results of three methods GRPa-MAGMA, GRPa-GSVA, PRSet in GReX with APOE.
The overlapping signals between GRPa-MAGMA, GRPa-GSVA, PRSet are visualized in the UpSet plot (A). (B) shows the R2 improvement ratio using GRPa-GSVA and PRSet from the nested model for formula 4,5,6 among different gene set curations.
Fig. 6.
Fig. 6.. Orthogonal test for key findings.
We visualize the correlation between PRS and (A) divalent inorganic cation homeostasis activity, and (C) calcium signaling pathway activity, (E) amyloid beta clearance activity, and (G) muscle tissue development activity, respectively. In (B), (D), (F), and (H), we plot the GSVA score distribution difference within group 1 versus group 2 comparisons among strata comparison, accordingly.

Similar articles

References

    1. Sims R, Hill M, Williams J. The multiplex model of the genetics of Alzheimer’s disease. Nat Neurosci. 2020;23:311–22. - PubMed
    1. Leonenko G, Baker E, Stevenson-Hoare J, Sierksma A, Fiers M, Williams J, et al. Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores. Nat Commun. 2021;12:4506. - PMC - PubMed
    1. de Rojas I, Moreno-Grau S, Tesi N, Grenier-Boley B, Andrade V, Jansen IE, et al. Common variants in Alzheimer’s disease and risk stratification by polygenic risk scores. Nat Commun. 2021;12:3417. - PMC - PubMed
    1. 2022 Alzheimer’s disease facts and figures. Alzheimers Dement. 2022;18:700–89. - PubMed
    1. Jansen IE, Savage JE, Watanabe K, Bryois J, Williams DM, Steinberg S, et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat Genet. 2019;51:404–13. - PMC - PubMed

Publication types