Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 May 11:2025.04.15.25325899.
doi: 10.1101/2025.04.15.25325899.

Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis

Affiliations

Dissecting the genetic complexity of myalgic encephalomyelitis/chronic fatigue syndrome via deep learning-powered genome analysis

Sai Zhang et al. medRxiv. .

Abstract

Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a complex, heterogeneous, and systemic disease defined by a suite of symptoms, including unexplained persistent fatigue, post-exertional malaise (PEM), cognitive impairment, myalgia, orthostatic intolerance, and unrefreshing sleep. The disease mechanism of ME/CFS is unknown, with no effective curative treatments. In this study, we present a multi-site ME/CFS whole-genome analysis, which is powered by a novel deep learning framework, HEAL2. We show that HEAL2 not only has predictive value for ME/CFS based on personal rare variants, but also links genetic risk to various ME/CFS-associated symptoms. Model interpretation of HEAL2 identifies 115 ME/CFS-risk genes that exhibit significant intolerance to loss-of-function (LoF) mutations. Transcriptome and network analyses highlight the functional importance of these genes across a wide range of tissues and cell types, including the central nervous system (CNS) and immune cells. Patient-derived multi-omics data implicate reduced expression of ME/CFS risk genes within ME/CFS patients, including in the plasma proteome, and the transcriptomes of B and T cells, especially cytotoxic CD4 T cells, supporting their disease relevance. Pan-phenotype analysis of ME/CFS genes further reveals the genetic correlation between ME/CFS and other complex diseases and traits, including depression and long COVID-19. Overall, HEAL2 provides a candidate genetic-based diagnostic tool for ME/CFS, and our findings contribute to a comprehensive understanding of the genetic, molecular, and cellular basis of ME/CFS, yielding novel insights into therapeutic targets. Our deep learning model also offers a potent, broadly applicable framework for parallel rare variant analysis and genetic prediction for other complex diseases and traits.

PubMed Disclaimer

Conflict of interest statement

Competing Interests M.P.S is a cofounder and scientific advisor of Crosshair Therapeutics, Exposomics, Filtricine, Fodsel, iollo, InVu Health, January AI, Marble Therapeutics, Mirvie, Next Thought AI, Orange Street Ventures, Personalis, Protos Biologics, Qbio, RTHM, SensOmics. M.P.S. is a scientific advisor of Abbratech, Applied Cognition, Enovone, Jupiter Therapeutics, M3 Helium, Mitrix, Neuvivo, Onza, Sigil Biosciences, TranscribeGlass, WndrHLTH, Yuvan Research. M.P.S. is a cofounder of NiMo Therapeutics. M.P.S. is an investor and scientific advisor of R42 and Swaza. M.P.S. is an investor in Repair Biotechnologies. M.R.H. is a member of the scientific advisory boards of the Open Medicine Foundation, Solve CFS/ME, the WE&ME Foundation, and Simmaron Research.

Figures

Figure 1.
Figure 1.. HEAL2 study design and prediction performance.
(A) Schematic of our study design and HEAL2 model architecture. +, ME/CFS case; -, negative control; WGS, whole-genome sequencing; QCs, quality controls; EUR, European; GIN, graph isomorphism network; SAE, sparse autoencoder; PPI, protein-protein interaction. (B) Prediction performance of five-fold cross-validation (100 repeats) on the discovery cohort. The curve and shaded area represent the AUROC mean and 95% confidence interval (CI), respectively. AUROC, area under the receiver operating characteristic curve; PC, principal component. (C) Prediction performance of testing (500 repeats) on the independent Cornell cohort. The curve and shaded area represent the AUROC mean and 95% CI, respectively. (D) Correlation between genetic risk scores and patient symptoms. We examined the correlation by computing the accuracy of HEAL or HEAL2 on predicting different patient symptoms. The red dashed lines indicate the accuracy of 0.5 which is a random guess. IBS, irritable bowel syndrome.
Figure 2.
Figure 2.. HEAL2 model interpretation and intolerance to LoF variants for ME/CFS genes.
(A) HEAL2 input feature importance. LoF, loss-of-function. (B) Gene prioritization by HEAL2 based on attention scores. Ctrl, control; NS, not significant. q-value by the Storey-Tibshirani procedure. Top 10 significant genes were highlighted in red. (C-D) Intolerance to LoF variants for ME/CFS genes based on LoFtool (C), RIVS (D), pLI (E), and LOEUF (F). The box plot center line, limits, and whiskers represent the median, quartiles, and 1.5x interquartile range (IQR), respectively. pLI, probability of loss-of-function intolerance; LOEUF, loss-of-function observed/expected upper bound fraction. P-value by the two-sided Wilcoxon rank-sum test.
Figure 3.
Figure 3.. Expression patterns of ME/CFS genes across different human tissues and cell types.
(A-B) Gene expression comparison between ME/CFS genes and the transcriptome within 50 human tissues (A) and 81 cell types (B). NS, not significant. P-value by two-sided t-test. Multiple testing correction was performed using the Bonferroni procedure. (C) Protein expression comparison between ME/CFS genes and all protein-coding genes within 32 human tissues. P-value by two-sided t-test.
Figure 4.
Figure 4.. Network dissection and multi-omic analysis of ME/CFS genes.
(A-B) Gene modules, including M9 (A) and M20 (B), enriched with ME/CFS genes. FDR, false discovery rate. P-value by Fisher’s exact test. (C-D) Gene ontology (GO) analysis (biological process) for M9 (C) and M20 (D) genes. Redundant GO terms were removed using the “simplify” function provided by “clusterProfiler”. GO terms with adjusted P < 0.05 were visualized. P-value by two-sided Fisher’s exact test. (E) GSEA running enrichment plot of plasma proteomics against M9 genes. Negative enrichment score indicates downregulation in ME/CFS and vice versa. GSEA, gene set enrichment analysis; NES, normalized enrichment score. (F) Gene expression comparison for M9 ME/CFS genes between patients and healthy individuals across multiple blood cell types. Dot and error bar represent mean and standard error, respectively. P-value by two-sided Wilcoxon rank-sum test followed by Bonferroni correction. FC, fold change; NK, natural killer; *, adjusted P < 0.05. (G) Uniform Manifold Approximation and Projection (UMAP) plot of scRNA-seq data, with the cytotoxic CD4 T cell cluster highlighted in red. (H) GSEA running enrichment plot of ME/CFS genes in scRNA-seq pseudobulk data from the cytotoxic CD4 T cell cluster. Negative enrichment score indicates downregulation in ME/CFS cases compared to healthy controls, and vice versa. CTL, cytotoxic T lymphocyte. (I) Relative expression of general, CD4-specific, and cytotoxic T cell marker genes (y-axis) per T cell cluster. Dots represent average expression (color) and percentage of expressing cells (size). Treg, regulatory T cell; mucosal-associated invariant T cell; Tgd, gamma delta T cell.
Figure 5.
Figure 5.. Genetic correlation between ME/CFS and other diseases and traits.
(A-B) Rare-variant-based genetic correlation based on SKAT-O (A) and burden (B) tests. P-value by one-sided Wilcoxon rank-sum. pLoF, predicted loss-of-function; NS, not significant. (C-D) Common-variant-based genetic correlation based on GWAS for complex diseases and traits (C) and COVID-19 phenotypes (D). Bonferroni procedure was used for adjusting P-values. COVID19 A2, B2, and C2 indicates severe covid vs. population, hospitalized covid vs. population, and covid vs. population, respectively; long COVID19_1, long COVID19_2, long COVID19_3, and long COVID19_4 indicate strict case vs. broad control, broad case vs. broad control, strict case vs. strict control, and broad case vs. strict control, respectively. (E) Genetic correlation between ME/CFS and Mendelian disorders. P-value by Fisher’s exact test followed by Bonferroni correction. NS, not significant; CI, confidence interval; MODY, maturity-onset diabetes of the young. The dot and error bar indicate the odds ratio and 95% CI, respectively.

References

    1. Lim E.-J. et al. Systematic review and meta-analysis of the prevalence of chronic fatigue syndrome/myalgic encephalomyelitis (CFS/ME). J Transl Med 18, 100 (2020). - PMC - PubMed
    1. Institute of Medicine, Board on the Health of Select Populations & Committee on the Diagnostic Criteria for Myalgic Encephalomyelitis/Chronic Fatigue Syndrome. Beyond Myalgic Encephalomyelitis/Chronic Fatigue Syndrome: Redefining an Illness. (National Academies Press, 2015). - PubMed
    1. Baxter H., Speight N. & Weir W. Life-Threatening Malnutrition in Very Severe ME/CFS. Healthcare 9, 459 (2021). - PMC - PubMed
    1. Maksoud R., Magawa C., Eaton-Fitch N., Thapaliya K. & Marshall-Gradisnik S. Biomarkers for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS): a systematic review. BMC Med 21, 189 (2023). - PMC - PubMed
    1. Valdez A. R. et al. Estimating Prevalence, Demographics, and Costs of ME/CFS Using Large Scale Medical Claims Data and Machine Learning. Front Pediatr 6, 412 (2018). - PMC - PubMed

Publication types