Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 21;5(4):101022.
doi: 10.1016/j.crmeth.2025.101022. Epub 2025 Apr 10.

Efficient and scalable construction of clinical variable networks for complex diseases with RAMEN

Affiliations

Efficient and scalable construction of clinical variable networks for complex diseases with RAMEN

Yiwei Xiong et al. Cell Rep Methods. .

Abstract

Understanding the interplay among clinical variables-such as demographics, symptoms, and laboratory results-and their relationships with disease outcomes is critical for advancing diagnostics and understanding mechanisms in complex diseases. Existing methods fail to capture indirect or directional relationships, while existing Bayesian network learning methods are computationally expensive and only infer general associations without focusing on disease outcomes. Here we introduce random walk- and genetic algorithm-based network inference (RAMEN), a method for Bayesian network inference that uses absorbing random walks to prioritize outcome-relevant variables and a genetic algorithm for efficient network refinement. Applied to COVID-19 (Biobanque québécoise de la COVID-19), intensive care unit (ICU) septicemia (MIMIC-III), and COPD (CanCOLD) datasets, RAMEN reconstructs networks linking clinical markers to disease outcomes, such as elevated lactate levels in ICU patients. RAMEN demonstrates advantages in computational efficiency and scalability compared to existing methods. By modeling outcome-specific relationships, RAMEN provides a robust tool for uncovering critical disease mechanisms, advancing diagnostics, and enabling personalized treatment strategies.

Keywords: Bayesian network inference; COVID-19; CP: Systems biology; absorbing random walk; chronic obstructive pulmonary disease; clinical variable networks; complex diseases; genetic algorithm; multi-omics; personalized medicine; septicemia.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview of the RAMEN methodology The RAMEN approach constructs Bayesian networks from clinical data through a sequential two-phase process. (Phase 1) Establishing the initial network via absorbing random walk-based permutation test. Beginning with preprocessed clinical data, this stage implements a permutation test via a random-walk strategy across a comprehensive network of all included variables, where nodes symbolize variables and edge weights indicate the mutual information among variable pairs. The process identifies stronger variable connections by tracking the frequency of edge traversal in successful random walks (ending at the target node). Edges with significantly higher traversal frequencies, as established through permutation testing, lay the groundwork for the network, preparing it for further enhancement. (Phase 2) Enhancing the network with a genetic algorithm. This stage further refines the Bayesian network structure. Starting with a set of initial network configurations derived from the early framework, the genetic algorithm applies crossover (merging two configurations) and mutation (applying random changes) to evolve these structures. Each cycle assesses the network structures against a specific scoring function, prioritizing those with superior scores for subsequent iterations. This cycle of refinement, through modification, assessment, and selection, persists until a stable score is achieved, culminating in an optimized network structure.
Figure 2
Figure 2
RAMEN unveils indicators of COVID-19 severity in BQC19 hospitalized patient data (A) A streamlined network showcasing 231 of the most significant connections identified by RAMEN, indicative of COVID-19 severity. The full names of the variables are provided in Data S1. The color and thickness of edges signify the connection strength (blue for weaker, red for stronger) based on mutual information metrics. Nodes are colored according to categories of clinical variables, with their size reflecting the strength of their correlation with COVID-19 severity. The diamond-shaped node represents the outcome variable, which is COVID-19 severity. (B) Comparison of AUROC for predicting COVID-19 severity using indicator variables, contrasting RAMEN-identified indicators against those identified through mutual information and Pearson correlation methods, with predictions made by support vector machines (SVMs). A higher AUROC suggests a greater relevance of the identified variables for severity prediction. Indicator variable selection by RAMEN is detailed in STAR Methods and, to ensure a fair comparison, all compared methods use the same number (161) of top indicators. (C) Analysis of Shapley additive explanations (SHAP) values, providing one possible explanation of the significance of clinical variables identified by RAMEN in SVM-based predictions. These values illustrate the potential impact of variables on the model’s prediction, indicating whether they contribute toward a positive or negative outcome. The consistent color scheme across the x axis highlights variables identified as dependable predictors by SHAP. For clarity, the importance ranking assigned by RAMEN is shown in parentheses after each variable name. (D) Heatmaps illustrating the conditional distribution of COVID-19 severity levels (SEV) across the values of direct indicator variables, where the heatmap colors represent the proportion of patients within each severity category for given indicator values. This visual representation aids in understanding the correlation between specific clinical indicators and severity outcomes.
Figure 3
Figure 3
Support for the COVID severity network edges from the RNA-seq data (A) Analysis of gene expression across three groups of differentially expressed (DE) genes linked to example nodes “ARDS,” “Albumin,” and “BMI” that directly connect to COVID severity. For example, with “Albumin,” we first pinpoint DE genes associated with albumin variability (i.e., genes with expression changes in patients with varying albumin levels, denoted as G1). Next, we identify DE genes linked to COVID severity (G2). The “Common” group represents DE genes common to both sets (G1G2); the “Albumin” group illustrates DE genes exclusive to the albumin variable (G1¬G2); and the “Severity” group shows DE genes unique to COVID-19 severity (G2¬G1). (B) Identification of the top enriched pathways for each variable based on their common DE genes with the severity variable (the “Common” group). The x axis shows the negative log10 of FDR-corrected p values. From the top to bottom are enrichment analyses we carried out for DE genes identified from the edge between variable “Acute Respiratory Distress Syndrome (ARDS)?” and variable “Severity,” the edge between “BMI” and “Severity,” and the edge between “Albumin” and “Severity.” (C) Validation of COVID-19 severity indicators using RNA-seq highlights RAMEN’s ability to uncover additional insights beyond those revealed by conventional statistical methods such as Pearson correlation and mutual information. Each method on the x axis (MI, mutual information; RAM, RAMEN; COR, Pearson correlation) classifies variables into indicators or non-indicators, with RNA-seq data providing the basis for ground truth. A variable is considered an indicator if its DE genes significantly overlap with those associated with COVID-19 severity, assessed via a hypergeometric test. The performance of each method is quantified using the F1 score from verifying the variables found by each method against the ground truth. RAMEN achieves a higher F1 score compared to statistics-based methods, indicating its ability to uncover relationships that extend beyond these methods.
Figure 4
Figure 4
RAMEN identifies effective indicator variables that cannot be found using mutual information or Pearson correlation (A) The long COVID network, where purple edges represent connections significant only to RAMEN, and green edges are also identified by mutual information or correlation. (B) Similar network for COVID-19 severity, with purple indicating edges found exclusively by RAMEN and green representing those also found by mutual information or correlation. The full names of the variables are provided in Data S1. (C) Heatmaps visualizing DE genes associated with “Platelets” and “COVID-19 severity.” The three groups of DE genes correspond to the unique DE genes of the two variables and common DE genes. (D) Pathway enrichment based on the common DE genes in (C). (E) A barplot demonstrating RAMEN’s ability to detect disease-relevant edges missed by Pearson correlation and mutual information. Using RNA-seq data as ground truth (see STAR Methods for details), among all the edges that cannot be found using Pearson correlation, the column “Not Corr, RAMEN” shows the percentage of disease-outcome-relevant edges found by RAMEN. “Not Corr, Not RAMEN” shows those that also cannot be found using RAMEN. Likewise, “Not MI, RAMEN” corresponds to the percentage of true edges missed by mutual information but found by RAMEN, and “Not MI, Not RAMEN” are the ones that are not found by both. “Random” is the performance of randomly selecting edges. The p values of the binomial tests (see details in the STAR Methods section quantification and statistical analysis) indicate that RAMEN is accurate in finding edges missed by other methods. This suggests that RAMEN has additional power in detecting disease-relevant edges compared to Pearson correlation and mutual information.
Figure 5
Figure 5
RAMEN identifies indicator variables and constructs disease-relevant networks across diseases using MIMIC-III and CanCOLD data (A) RAMEN-derived networks for septicemia (136 outcome-relevant variables, left) and COPD (22 outcome-relevant variables, right). Node colors represent variable types, and edge colors indicate connection intensity, as shown in the legend. Node sizes reflect RAMEN’s importance scores (for details, see STAR Methods), indicating the relevance of each variable to the disease outcome. Diamond-shaped nodes represent outcome variables, specifically septicemia death and COPD exacerbation. These results demonstrate RAMEN’s applicability to multiple diseases. (B) SHAP values quantify the importance of indicator variables based on their impact on disease-outcome prediction. Values in parentheses indicate RAMEN’s feature importance rankings. The alignment between SHAP rankings and RAMEN’s selections underscores the method’s robustness in identifying key variables. (C) Heatmaps illustrating the distribution of informative indicator variables for septicemia and COPD, further emphasizing RAMEN’s ability to uncover disease-relevant insights across a range of diseases. The plots reveal significant shifts in patient distributions across different disease outcomes based on the values of key indicator variables.
Figure 6
Figure 6
RAMEN outperforms other methods in systematic benchmarking (A and B) Comparison of edge connection prediction performance using the COVID-19 dataset (A) and the simulation dataset with a known ground-truth network (B). The y axis represents the F1 score for edge connection prediction across the methods listed on the x axis. For the COVID-19 dataset, RNA-seq data are used to validate the predicted edges (as detailed in STAR Methods), while the simulation dataset provides a known ground truth for edge connections. RAMEN achieves superior performance compared to all methods, particularly excelling over other Bayesian network learning approaches. (C) Edge direction prediction performance using the simulation dataset with a known ground truth for edge directions. A true positive requires the correct prediction of both the edge connection and its direction. The comparison is restricted to Bayesian network methods capable of predicting edge direction. RAMEN demonstrates a significant performance advantage over these methods. (D–F) Evaluation of indicator variables identified by different methods based on the classification performance of SVM models trained with these variables. RAMEN achieves results that are comparable to or better than those of all other methods, further emphasizing its superior ability to identify informative variables across different disease studies. p values (∗p<0.05, ∗∗p<0.01, ∗∗∗p<0.001, ∗∗∗∗p<0.0001) were generated by Student’s t tests with n = 5 technical replicates. Boxplots show the interquartile range (IQR), with the median represented by a solid line. Whiskers extend to the most extreme data points within 1.5 times the IQR from the first and third quartiles. (D) COVID-19, (E) septicemia, and (F) COPD.

Similar articles

References

    1. Mofijur M., Fattah I.M.R., Alam M.A., Islam A.B.M.S., Ong H.C., Rahman S.M.A., Najafi G., Ahmed S.F., Uddin M.A., Mahlia T.M.I. Impact of COVID-19 on the social, economic, environmental and energy domains: Lessons learnt from a global pandemic. Sustain. Prod. Consum. 2021;26:343–359. - PMC - PubMed
    1. Nicola M., Alsafi Z., Sohrabi C., Kerwan A., Al-Jabir A., Iosifidis C., Agha M., Agha R. The socio-economic implications of the coronavirus pandemic (COVID-19): A review. Int. J. Surg. 2020;78:185–193. - PMC - PubMed
    1. Ding J., Hostallero D.E., El Khili M.R., Fonseca G.J., Milette S., Noorah N., Guay-Belzile M., Spicer J., Daneshtalab N., Sirois M., et al. A network-informed analysis of SARS-CoV-2 and hemophagocytic lymphohistiocytosis genes’ interactions points to neutrophil extracellular traps as mediators of thrombosis in COVID-19. PLoS Comput. Biol. 2021;17 - PMC - PubMed
    1. Logue J.K., Franko N.M., McCulloch D.J., McDonald D., Magedson A., Wolf C.R., Chu H.Y. Sequelae in adults at 6 months after COVID-19 infection. JAMA Netw. Open. 2021;4 - PMC - PubMed
    1. Raveendran A.V., Jayadevan R., Sashidharan S. Long COVID: an overview. Diabetes Metab. Syndr. 2021;15:869–875. - PMC - PubMed

LinkOut - more resources