Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Aug 3;9(8):220638.
doi: 10.1098/rsos.220638. eCollection 2022 Aug.

Causal machine learning for healthcare and precision medicine

Affiliations
Review

Causal machine learning for healthcare and precision medicine

Pedro Sanchez et al. R Soc Open Sci. .

Abstract

Causal machine learning (CML) has experienced increasing popularity in healthcare. Beyond the inherent capabilities of adding domain knowledge into learning systems, CML provides a complete toolset for investigating how a system would react to an intervention (e.g. outcome given a treatment). Quantifying effects of interventions allows actionable decisions to be made while maintaining robustness in the presence of confounders. Here, we explore how causal inference can be incorporated into different aspects of clinical decision support systems by using recent advances in machine learning. Throughout this paper, we use Alzheimer's disease to create examples for illustrating how CML can be advantageous in clinical scenarios. Furthermore, we discuss important challenges present in healthcare applications such as processing high-dimensional and unstructured data, generalization to out-of-distribution samples and temporal relationships, that despite the great effort from the research community remain to be solved. Finally, we review lines of research within causal representation learning, causal discovery and causal reasoning which offer the potential towards addressing the aforementioned challenges.

Keywords: causal machine learning; causal representation learning; precision medicine.

PubMed Disclaimer

Conflict of interest statement

We declare that we have no competing interests.

Figures

Figure 1.
Figure 1.
CML in healthcare helps understanding biases and formalizing reasoning about the effect of interventions. We illustrated, with a hypothetical example, that high-level features (causal representations) can be extracted from low-level data (e.g. I1 might correspond to the brain volume derived from a medical image) into a graph corresponding to the data generation process. CML can be used to discover which relationships between variables are spurious and which are causal, illustrated with lines dashed and solid lines respectively. Finally, CML offers tools for reasoning about the effect of interventions (shown with the do() operator). For instance, an intervention on D1 would only affect the downstream variables in the graph while other relationships are either not relevant (due to graph mutilation) or remain unchanged.
Figure 2.
Figure 2.
Causal graph (left) and illustration of how the brain changes in MR images in response to interventions on ‘Age’ or ‘Alzheimer’s disease status’. The images are axial slices of a brain MR scan. The middle image used as a baseline is from a patient aged 64 years old who is classified an cognitively normal (CN) within the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. All other images are synthesized with a conditional generative model [39]. The images with grey background are difference images obtained by subtracting the synthesized image from the baseline. The upper sequence of images is generated by fixing Alzheimer’s status at CN and increasing age by 3 years. The bottom images are generated by fixing the age at 64 and increasing Alzheimer’s status to MCI and AD, as discussed in the main text.
Figure 3.
Figure 3.
We illustrate the difference between individualized and average treatment effect (ITE versus ATE). ‘Feature’ represents patient characteristics, which would be multi-dimensional in reality. ‘Outcome’ is some measure of response to the treatment, where a more positive value is preferable. The ITE for each patient is the difference between actual and the counterfactual outcome. We show an example counterfactual to highlight that ITE for some patients might differ from the average (ATE). By employing causal inference methods to estimate individualized treatment effects, we can understand which patients benefit from certain medication and which patients do not, thus enabling us to make personalized treatment recommendations. Note that the patient data points are evenly distributed along the feature axis, which would indicate that this data comes from an RCT (due to lack of bias). The estimation of treatment affect using observational data is subject to confounding as patient characteristics affect both the selection of treatment and outcome. Causal inference methods need to mitigate this.
Figure 4.
Figure 4.
Reasoning about generalization of a prediction task with a causal graph. Anti-causal prediction and a spurious association that may lead to shortcut learning are illustrated.

References

    1. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sánchez CI. 2017. A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60-88. (10.1016/j.media.2017.07.005) - DOI - PubMed
    1. Bica I, Alaa AM, Lambert C, Schaar M. 2021. From real-world patient data to individualized treatment effects using machine learning: current and future methods to address underlying challenges. Clin. Pharmacol. Ther. 109, 87-100. (10.1002/cpt.1907) - DOI - PubMed
    1. Holland PW. 1986. Statistics and causal inference. J. Am. Stat. Assoc. 81, 945-960. (10.1080/01621459.1986.10478354) - DOI
    1. Pearl J. 2009. Causality. Cambridge, UK: Cambridge University Press
    1. Imbens GW, Rubin DB. 2015. Causal inference for statistics, social, and biomedical sciences: an introduction. Cambridge, UK: Cambridge University Press.