. 2018 Dec;27(12):3814-3834.

doi: 10.1177/0962280217713347. Epub 2017 Jun 28.

Causality on longitudinal data: Stable specification search in constrained structural equation modeling

Ridho Rahmadi^{1

2}, Perry Groot², Marieke Hc van Rijn³, Jan Ajg van den Brand³, Marianne Heins⁴, Hans Knoop⁵, Tom Heskes²; Alzheimer’s Disease Neuroimaging Initiative; MASTERPLAN Study Group; OPTIMISTIC consortium

Affiliations

¹ 1 Department of Informatics, Universitas Islam Indonesia, Sleman, Indonesia.
² 2 Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.
³ 3 Department of Nephrology, Radboud University Medical Center, Nijmegen, The Netherlands.
⁴ 4 Netherlands Institute for Health Services Research, Utrecht, The Netherlands.
⁵ 5 Department of Medical Psychology, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands.

PMID: 28657454
PMCID: PMC6249641
DOI: 10.1177/0962280217713347

Causality on longitudinal data: Stable specification search in constrained structural equation modeling

Ridho Rahmadi et al. Stat Methods Med Res. 2018 Dec.

. 2018 Dec;27(12):3814-3834.

doi: 10.1177/0962280217713347. Epub 2017 Jun 28.

Authors

Affiliations

¹ 1 Department of Informatics, Universitas Islam Indonesia, Sleman, Indonesia.
² 2 Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands.
³ 3 Department of Nephrology, Radboud University Medical Center, Nijmegen, The Netherlands.
⁴ 4 Netherlands Institute for Health Services Research, Utrecht, The Netherlands.
⁵ 5 Department of Medical Psychology, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands.

PMID: 28657454
PMCID: PMC6249641
DOI: 10.1177/0962280217713347

Abstract

A typical problem in causal modeling is the instability of model structure learning, i.e., small changes in finite data can result in completely different optimal models. The present work introduces a novel causal modeling algorithm for longitudinal data, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Our approach uses exploratory search but allows incorporation of prior knowledge, e.g., the absence of a particular causal relationship between two specific variables. We represent causal relationships using structural equation models. Models are scored along two objectives: the model fit and the model complexity. Since both objectives are often conflicting, we apply a multi-objective evolutionary algorithm to search for Pareto optimal models. To handle the instability of small finite data samples, we repeatedly subsample the data and select those substructures (from the optimal models) that are both stable and parsimonious. These substructures can be visualized through a causal graph. Our more exploratory approach achieves at least comparable performance as, but often a significant improvement over state-of-the-art alternative approaches on a simulated data set with a known ground truth. We also present the results of our method on three real-world longitudinal data sets on chronic fatigue syndrome, Alzheimer disease, and chronic kidney disease. The findings obtained with our approach are generally in line with results from more hypothesis-driven analyses in earlier studies and suggest some novel relationships that deserve further research.

Keywords: Alzheimer’s disease; Longitudinal data; causal modeling; chronic fatigue syndrome; chronic kidney disease; multi-objective evolutionary algorithm; stability selection; structural equation model.

PubMed Disclaimer

Figures

**Figure 1.**
Given a longitudinal data set, S3L uses the baseline observations to infer a baseline model, and reshapes the whole data set to infer a transition model. Both baseline and transition models are annotated with a reliability score α and a standardized causal effect β. S3C: stable specification search for cross-sectional data; S3L: stable specification search for longitudinal data.

**Figure 2.**
(a) The baseline model which is used to capture causal relationships at the initial time slice, e.g., before medical treatment. (b) The transition model which is used to represent causal relationships within and between time slices, e.g., during medical treatment. (c) The corresponding “unrolled” longitudinal model.

**Figure 3.**
D is a matrix representing the original data shape which consists of s instances, p variables, and i time slices. $D'$ is a matrix representing the corresponding reshaped data.

**Figure 4.**
The longitudinal model with four variables and three time slices, used to generate simulated data.

**Figure 5.**
Results from simulation data with sample size 400: ROC curves for (a) the edge stability and (b) the causal path stability (without prior knowledge), and (c) the edge path stability and (d) the causal path stability (with prior knowledge), for different values of $π_{sel}$ in the range of $[0, 1]$ . Table 1 lists the corresponding AUCs. CPC: conservative PC; FGES: fast greedy equivalent search; FPR: false positive rate; S3L: stable specification search for longitudinal data; TPR: true positive rate.

**Figure 6.**
Results from simulation data with sample size 2000: ROC curves for (a) the edge stability and (b) the causal path stability (without prior knowledge), and (c) the edge path stability and (d) the causal path stability (with prior knowledge), for different values of $π_{sel}$ in the range of $[0, 1]$ . Tables 4 lists the corresponding AUCs. CPC: conservative PC; FGES: fast greedy equivalent search; FPR: false positive rate; S3L: stable specification search for longitudinal data; TPR: true positive rate.

**Figure 7.**
The stability graphs of the baseline model in (a) and (b) and the transition model in (c) and (d) for chronic fatigue syndrome, with edge stability in (a) and (c), and causal path stability in (b) and (d). The relevant regions, above $π_{sel}$ and left of $π_{bic}$ , contain the relevant structures.

**Figure 8.**
(a) The baseline model and (b) the transition model of chronic fatigue syndrome. The dashed line represents a strong relation between two variables but the causal direction cannot be determined from the data. Each edge has a reliability score (the highest selection probability in the relevant region of the edge stability graph) and a standardized total causal effect estimation. For example, the annotation “ $1 / 0.71$ “ represents a reliability score of 1 and a standardized total causal effect of 0.71. Note that the standardized total causal effect represents not just the direct causal effect corresponding to the edge, but the total causal effect also including indirect effects.

**Figure 9.**
The stability graphs of the baseline model in (a) and (b) and the transition model in (c) and (d) for Alzheimer’s disease, with edge stability in (a) and (c), and causal path stability in (b) and (d). The relevant regions, above $π_{sel}$ and left of $π_{bic}$ , contain the relevant structures.

**Figure 10.**
(a) The baseline model and (b) the transition model of Alzheimer’s disease. The dashed line represents a strong relation between two variables but the causal direction cannot be determined from the data. Each edge has a reliability score (the highest selection probability in the relevant region of the edge stability graph) and a standardized total causal effect estimation. For example, the annotation “ $1 / 0.81$ “ represents a reliability score of 1 and a total standardized causal effect of 0.81. Note that the standardized total causal effect represents not just the direct causal effect corresponding to the edge, but the total causal effect also including indirect effects.

**Figure 11.**
The stability graphs of the baseline model in (a) and (b) and the transition model in (c) and (d) for chronic kidney disease, with edge stability in (a) and (c), and causal path stability in (b) and (d). The relevant regions, above $π_{sel}$ and left of $π_{bic}$ , contain the relevant structures.

**Figure 12.**
(a) The baseline model and (b) the transition model of chronic kidney disease. The dashed line represents a strong relation between two variables but the causal direction cannot be determined from the data. Each edge has a reliability score (the highest selection probability in the relevant region of the edge stability graph) and a standardized total causal effect estimation. For example, the annotation “ $1 / 0.88$ ” represents a reliability score of 1 and a standardized total causal effect of 0.88. Note that the standardized total causal effect represents not just the direct causal effect corresponding to the edge, but the total causal effect also including indirect effects.

See this image and copyright information in PMC

Cited by

Predicting kidney failure from longitudinal kidney function trajectory: A comparison of models.
van den Brand JAJG, Dijkstra TMH, Wetzels J, Stengel B, Metzger M, Blankestijn PJ, Lambers Heerspink HJ, Gansevoort RT. van den Brand JAJG, et al. PLoS One. 2019 May 9;14(5):e0216559. doi: 10.1371/journal.pone.0216559. eCollection 2019. PLoS One. 2019. PMID: 31071186 Free PMC article.
Methodology of the DCCSS later fatigue study: a model to investigate chronic fatigue in long-term survivors of childhood cancer.
Penson A, van Deuren S, Bronkhorst E, Keizer E, Heskes T, Coenen MJH, Rosmalen JGM, Tissing WJE, van der Pal HJH, de Vries ACH, van den Heuvel-Eibrink MM, Neggers S, Versluys BAB, Louwerens M, van der Heiden-van der Loo M, Pluijm SMF, Grootenhuis M, Blijlevens N, Kremer LCM, van Dulmen-den Broeder E, Knoop H, Loonen J. Penson A, et al. BMC Med Res Methodol. 2021 May 16;21(1):106. doi: 10.1186/s12874-021-01298-7. BMC Med Res Methodol. 2021. PMID: 33993873 Free PMC article.
Longitudinal outcome monitoring in patients with chronic gastroduodenal symptoms investigated using the Gastric Alimetry system: study protocol.
Varghese C, Dachs N, Schamberg G, McCool K, Law M, Xu W, Calder S, Foong D, Ho V, Daker C, Andrews CN, Gharibans AA, O'Grady G. Varghese C, et al. BMJ Open. 2023 Nov 27;13(11):e074462. doi: 10.1136/bmjopen-2023-074462. BMJ Open. 2023. PMID: 38011983 Free PMC article.
From hype to reality: data science enabling personalized medicine.
Fröhlich H, Balling R, Beerenwinkel N, Kohlbacher O, Kumar S, Lengauer T, Maathuis MH, Moreau Y, Murphy SA, Przytycka TM, Rebhan M, Röst H, Schuppert A, Schwab M, Spang R, Stekhoven D, Sun J, Weber A, Ziemek D, Zupan B. Fröhlich H, et al. BMC Med. 2018 Aug 27;16(1):150. doi: 10.1186/s12916-018-1122-7. BMC Med. 2018. PMID: 30145981 Free PMC article.
Potential mechanisms of the fatigue-reducing effect of cognitive-behavioral therapy in cancer survivors: Three randomized controlled trials.
Müller F, Wijayanto F, Abrahams H, Gielissen M, Prinsen H, Braamse A, van Laarhoven HWM, Groot P, Heskes T, Knoop H. Müller F, et al. Psychooncology. 2021 Sep;30(9):1476-1484. doi: 10.1002/pon.5710. Epub 2021 May 3. Psychooncology. 2021. PMID: 33899978 Free PMC article.

See all "Cited by" articles

References

1. Daniel RM, Kenward MG, Cousens SN, et al. Using causal diagrams to guide analysis in missing data problems. Stat Methods Med Res 2012; 21: 243–256. - PubMed
1. Hoover KD. Causality in economics and econometrics. In: Steven N Durlauf and Lawrence E Blume (eds) The new Palgrave dictionary of economics. Basingstoke: Palgrave Mcmillan, 2008, p.2.
1. Abu-Bader S, Abu-Qarn AS. Government expenditures, military spending and economic growth: causality evidence from Egypt, Israel, and Syria. J Policy Model 2003; 25: 567–583.
1. Taguri M, Featherstone J, Cheng J. Causal mediation analysis with multiple causally non-ordered mediators. Stat Methods Med Res 2018; 27: 3–19. - PMC - PubMed
1. Pearl J. Causal inference from indirect experiments. Artif Intell Med 1995; 7: 561–582. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Causality on longitudinal data: Stable specification search in constrained structural equation modeling

Affiliations

Causality on longitudinal data: Stable specification search in constrained structural equation modeling

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources