Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec;27(12):3814-3834.
doi: 10.1177/0962280217713347. Epub 2017 Jun 28.

Causality on longitudinal data: Stable specification search in constrained structural equation modeling

Affiliations

Causality on longitudinal data: Stable specification search in constrained structural equation modeling

Ridho Rahmadi et al. Stat Methods Med Res. 2018 Dec.

Abstract

A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in finite data can result in completely different optimal models. The present work introduces a novel causal modeling algorithm for longitudinal data, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Our approach uses exploratory search but allows incorporation of prior knowledge, e.g., the absence of a particular causal relationship between two specific variables. We represent causal relationships using structural equation models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting, we apply a multi-objective evolutionary algorithm to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures (from the optimal models) that are both stable and parsimonious. These substructures can be visualized through a causal graph. Our more exploratory approach achieves at least comparable performance as, but often a significant improvement over state-of-the-art alternative approaches on a simulated data set with a known ground truth. We also present the results of our method on three real-world longitudinal data sets on chronic fatigue syndrome, Alzheimer disease, and chronic kidney disease. The findings obtained with our approach are generally in line with results from more hypothesis-driven analyses in earlier studies and suggest some novel relationships that deserve further research.

Keywords: Alzheimer’s disease; Longitudinal data; causal modeling; chronic fatigue syndrome; chronic kidney disease; multi-objective evolutionary algorithm; stability selection; structural equation model.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Given a longitudinal data set, S3L uses the baseline observations to infer a baseline model, and reshapes the whole data set to infer a transition model. Both baseline and transition models are annotated with a reliability score α and a standardized causal effect β. S3C: stable specification search for cross-sectional data; S3L: stable specification search for longitudinal data.
Figure 2.
Figure 2.
(a) The baseline model which is used to capture causal relationships at the initial time slice, e.g., before medical treatment. (b) The transition model which is used to represent causal relationships within and between time slices, e.g., during medical treatment. (c) The corresponding “unrolled” longitudinal model.
Figure 3.
Figure 3.
D is a matrix representing the original data shape which consists of s instances, p variables, and i time slices. D' is a matrix representing the corresponding reshaped data.
Figure 4.
Figure 4.
The longitudinal model with four variables and three time slices, used to generate simulated data.
Figure 5.
Figure 5.
Results from simulation data with sample size 400: ROC curves for (a) the edge stability and (b) the causal path stability (without prior knowledge), and (c) the edge path stability and (d) the causal path stability (with prior knowledge), for different values of πsel in the range of [0,1]. Table 1 lists the corresponding AUCs. CPC: conservative PC; FGES: fast greedy equivalent search; FPR: false positive rate; S3L: stable specification search for longitudinal data; TPR: true positive rate.
Figure 6.
Figure 6.
Results from simulation data with sample size 2000: ROC curves for (a) the edge stability and (b) the causal path stability (without prior knowledge), and (c) the edge path stability and (d) the causal path stability (with prior knowledge), for different values of πsel in the range of [0,1]. Tables 4 lists the corresponding AUCs. CPC: conservative PC; FGES: fast greedy equivalent search; FPR: false positive rate; S3L: stable specification search for longitudinal data; TPR: true positive rate.
Figure 7.
Figure 7.
The stability graphs of the baseline model in (a) and (b) and the transition model in (c) and (d) for chronic fatigue syndrome, with edge stability in (a) and (c), and causal path stability in (b) and (d). The relevant regions, above πsel and left of πbic, contain the relevant structures.
Figure 8.
Figure 8.
(a) The baseline model and (b) the transition model of chronic fatigue syndrome. The dashed line represents a strong relation between two variables but the causal direction cannot be determined from the data. Each edge has a reliability score (the highest selection probability in the relevant region of the edge stability graph) and a standardized total causal effect estimation. For example, the annotation “1/0.71“ represents a reliability score of 1 and a standardized total causal effect of 0.71. Note that the standardized total causal effect represents not just the direct causal effect corresponding to the edge, but the total causal effect also including indirect effects.
Figure 9.
Figure 9.
The stability graphs of the baseline model in (a) and (b) and the transition model in (c) and (d) for Alzheimer’s disease, with edge stability in (a) and (c), and causal path stability in (b) and (d). The relevant regions, above πsel and left of πbic, contain the relevant structures.
Figure 10.
Figure 10.
(a) The baseline model and (b) the transition model of Alzheimer’s disease. The dashed line represents a strong relation between two variables but the causal direction cannot be determined from the data. Each edge has a reliability score (the highest selection probability in the relevant region of the edge stability graph) and a standardized total causal effect estimation. For example, the annotation “1/0.81“ represents a reliability score of 1 and a total standardized causal effect of 0.81. Note that the standardized total causal effect represents not just the direct causal effect corresponding to the edge, but the total causal effect also including indirect effects.
Figure 11.
Figure 11.
The stability graphs of the baseline model in (a) and (b) and the transition model in (c) and (d) for chronic kidney disease, with edge stability in (a) and (c), and causal path stability in (b) and (d). The relevant regions, above πsel and left of πbic, contain the relevant structures.
Figure 12.
Figure 12.
(a) The baseline model and (b) the transition model of chronic kidney disease. The dashed line represents a strong relation between two variables but the causal direction cannot be determined from the data. Each edge has a reliability score (the highest selection probability in the relevant region of the edge stability graph) and a standardized total causal effect estimation. For example, the annotation “1/0.88” represents a reliability score of 1 and a standardized total causal effect of 0.88. Note that the standardized total causal effect represents not just the direct causal effect corresponding to the edge, but the total causal effect also including indirect effects.

Similar articles

Cited by

References

    1. Daniel RM, Kenward MG, Cousens SN, et al. Using causal diagrams to guide analysis in missing data problems. Stat Methods Med Res 2012; 21: 243–256. - PubMed
    1. Hoover KD. Causality in economics and econometrics. In: Steven N Durlauf and Lawrence E Blume (eds) The new Palgrave dictionary of economics. Basingstoke: Palgrave Mcmillan, 2008, p.2.
    1. Abu-Bader S, Abu-Qarn AS. Government expenditures, military spending and economic growth: causality evidence from Egypt, Israel, and Syria. J Policy Model 2003; 25: 567–583.
    1. Taguri M, Featherstone J, Cheng J. Causal mediation analysis with multiple causally non-ordered mediators. Stat Methods Med Res 2018; 27: 3–19. - PMC - PubMed
    1. Pearl J. Causal inference from indirect experiments. Artif Intell Med 1995; 7: 561–582. - PubMed

Publication types

LinkOut - more resources