Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov;10(11):1433-1443.
doi: 10.1002/psp4.12715. Epub 2021 Oct 30.

Heterogeneous treatment effect analysis based on machine-learning methodology

Affiliations

Heterogeneous treatment effect analysis based on machine-learning methodology

Xiajing Gong et al. CPT Pharmacometrics Syst Pharmacol. 2021 Nov.

Abstract

Heterogeneous treatment effect (HTE) analysis focuses on examining varying treatment effects for individuals or subgroups in a population. For example, an HTE-informed understanding can critically guide physicians to individualize the medical treatment for a certain disease. However, HTE analysis has not been widely recognized and used, even given the explosive increase of data availability attributed to the arrival of the Big Data era. Part of the reason behind its underuse is that data are often of high dimension and high complexity, which pose significant challenges for applying conventional HTE analysis methods. To meet these challenges, a newly developed causal forest HTE method has been derived from the random forest machine-learning algorithm. We conducted a systematic performance evaluation for the causal forest method against the conventional two-step method by simulating scenarios with different levels of complexity for the analysis. Our results show that causal forest outperforms the conventional HTE method in assessing treatment effect, especially when data are complex (e.g., nonlinear) and high dimensional, suggesting that causal forest is a promising tool for real-world applications of HTE analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declared no competing interests for this work.

Figures

FIGURE 1
FIGURE 1
(a) Homogenous treatment effect (no treatment heterogeneity): the outcome of the treatment shows variation across the individuals and between treatment groups, but the treatment effect (i.e., the difference of the outcomes depicted by the dotted lines between the two treatment outcome curves) is the same for every individual. (b) Heterogeneous treatment effect (HTE): treatment effect varies among individuals. Some individuals benefit more, some less, and some might not benefit at all from the treatment
FIGURE 2
FIGURE 2
Treatment effect in a data set where the relationship between the two covariates ×1 and treatment effect is nonlinear. The (a) “true” treatment effect with varying values of ×1 and ×2 and the predicted treatment effect using (b) causal forest and (c) the two‐step method. The treatment effect is denoted by color from blue (low) to red (high)
FIGURE 3
FIGURE 3
Comparison of the performance of the causal forest and two‐step methods. Results are based on 200 replicated simulations. Mean (bar height) and standard deviation (error bar) of the root mean square error (RMSE) are displayed
FIGURE 4
FIGURE 4
The variable importance determined by causal forest for high‐dimensional simulated data based on Model IV. The preset significant covariates are shown in orange. The five preset important covariates were identified, as their variable importance values are greater than the significance threshold (dashed). Please refer to Supplementary Information for a description of the statistical significance test
FIGURE 5
FIGURE 5
Incremental gains curves (or Qini curves) from each model. This curve shows the cumulative number of incremental individuals with positive treatment effect relative to the cumulative number of the targeted population. The dashed diagonal line depicts the theoretical incremental individuals with positive treatment effect from random targeting, whereas the gray line refers to the true treatment effect. For each mode, the incremental gain curves shown are the average of all the curves from 200 simulation replications. The Qini coefficients displayed on each panel are the average values from 200 simulation replications

References

    1. Schork NJ. Personalized medicine: time for one‐person trials. Nature. 2015;520(7549):609‐611. - PubMed
    1. Garrido MM, Deb P, Burgess JF, et al. Choosing models for health care cost analyses: issues of nonlinearity and endogeneity. Health Serv Res. 2012;47(6):2377‐2397. - PMC - PubMed
    1. Belloni A, Chernozhukov V, Hansen C. High‐dimensional methods and inference on structural and treatment effects. J Economic Perspect. 2014;28(2):29‐50.
    1. Tsay RS, Chen R. Nonlinear Time Series Analysis . Vol. 891. John Wiley & Sons; 2018.
    1. Fernández‐Villaverde J, Rubio‐Ramírez JF. Estimating dynamic equilibrium economies: linear versus nonlinear likelihood. J Appl Econ. 2005;20(7):891‐910.