Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 7;172(1):W1-W25.
doi: 10.7326/M18-3668. Epub 2019 Nov 12.

The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement: Explanation and Elaboration

The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement: Explanation and Elaboration

David M Kent et al. Ann Intern Med. .

Abstract

The PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement was developed to promote the conduct of, and provide guidance for, predictive analyses of heterogeneity of treatment effects (HTE) in clinical trials. The goal of predictive HTE analysis is to provide patient-centered estimates of outcome risk with versus without the intervention, taking into account all relevant patient attributes simultaneously, to support more personalized clinical decision making than can be made on the basis of only an overall average treatment effect. The authors distinguished 2 categories of predictive HTE approaches (a "risk-modeling" and an "effect-modeling" approach) and developed 4 sets of guidance statements: criteria to determine when risk-modeling approaches are likely to identify clinically meaningful HTE, methodological aspects of risk-modeling methods, considerations for translation to clinical practice, and considerations and caveats in the use of effect-modeling approaches. They discuss limitations of these methods and enumerate research priorities for advancing methods designed to generate more personalized evidence. This explanation and elaboration document describes the intent and rationale of each recommendation and discusses related analytic considerations, caveats, and reservations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. The Scale Dependence of Heterogeneity of Treatment Effect (HTE)
The above plots depict the scale dependence of effect heterogeneity. All 3 scenarios are drawn from hypothetical trials with the same overall results (outcome rates 8.8% versus 6.6% in the control (open circles) versus treatment (closed circles) groups) and depict outcomes in low risk (75% of patients, Q1–3) and high risk (25% of patients, Q4) groups (where control event rates are 5% versus 20%, respectively). Plots in the left, middle and right column display outcome risks, relative effects and absolute effects, respectively. In the first row, effect heterogeneity is absent on the relative scale, but present on the absolute scale. In the second row, effect heterogeneity is present on the relative scale but absent on the absolute scale. In third row, effect heterogeneity is present on both the relative and the absolute scale. Most typically, the statistical significance of HTE is tested on the relative scale (middle column), since regression analyses are often performed on these scales. Provided sufficient statistical power, analyses 2 and 3 would show statistically significant HTE. However, regardless of the scale of the analysis, the clinical importance of HTE should generally be evaluated on the absolute scale. When absolute effects span a decisionally-important threshold, which depends on the treatment burden (e.g. harms and costs), HTE is said to be clinically important. In this example, for illustratrive purposes we have arbitrarily set a decisionally relevant threshold at a 1 percentage point reduction in outcome risk. Here, while there is HTE on the absolute scale in both analyses 1 and 3, clinically-important heterogeneity is present only in the third analysis, where the treatment that is beneficial on average may not be worth the treatment burden for many (indeed most) patients. Note, the presence of statistically significant interaction (on the relative scale) does not imply clinically important HTE, and that the absence of a statistically significant interaction does not imply the absence of clinically important HTE. It is also important to note that testing heterogeneity on the relative scale does not test a specific causal hypothesis regarding effect modification (regardless of the subgrouping variable), but merely tests the hypothesis that relative effects are the same in one group versus another group. Establishing causal interaction effects are not necessary to improve the targeting of therapy. We also note that this diagram makes the simplifying assumption of uniform treatment burdens across all levels of risk.In practice adverse events may vary across risk groups, and the threshold is also sensitive to patient values and preferences.
Figure 2.
Figure 2.. Effects of Lifestyle Modification and Metormin versus Usual Care in Patients with Prediabetes at Different Risks of Developing Diabetes
Figure 2 presents HTE analysis of the Diabetes Prevention Program (DPP) Trial as a function of baseline risk. Event rates (top graph), hazard ratios (middle graph) and absolute effects (lower graph) are shown. Both lifestyle modification (left panel) and metformin (right panel) are compared to usual care as a function of baseline risk. For lifestyle modification, a consistent 58% reduction in the hazard of developing diabetes over three years was found across all levels of risk. This consistent relative effect yields HTE on the absolute scale of potential clinical importance. In contrast, the effects of metformin are heterogeneous on the both the hazard ratio scale and on the absolute scale. Penalized splines were used to model the relationship between the linear predictor of risk and the time to event outcome. Vertical lines denote 95% confidence intervals and p-values are based on the null hypothesis of no effect modification tested using the linear predictor of risk in a Cox model. The dashed lines show the average effects in the trial. Prediction of incient diabetes with an external model derived from the Framingham cohort yielded a similar pattern.
Figure 3:
Figure 3:. The Value of a Risk Modeling Approach is Likely to be Greater when the Average Treatment Effect in a Trial (Treatment A) is Near a Decision Threshold
Figure 3 depicts the anticipated influence of a risk modeling approach in two trials testing different treatments in the same population, one of a treatment (A) with a slightly favorable benefit-harm trade-off, and the other of a treatment with an extremely favorable benefit-harm trade-off (treatment B). Under both conditions, the control event rate is 25% and the minimal clinically significant difference (MCSD, i.e., the absolute benefit that would justify the experimental therapy) is 3 percentage points. (For simplicity, we display a single MCSD, with grey shading corresponding to portions of the population that should not be treated, but this value varies according to individual patient values and preferences.) A risk modeling approach would be of substantially greater value for the trial of therapy A, with the slightly favorable trade-off (with a relative risk reduction [RRR] of 0.15; absolute risk difference = 3.75%, just above the MCSD), compared to the trial of the therapy B with the extremely favorable trade-off (RRR of 0.5; risk difference = 12.5%, substantially above the MCSD). The distributions show the anticipated risk differences that emerge with a constant RRR when the same moderately-predictive risk prediction model (i.e., with a c-statistic = ~0.70) is applied to the population. In the slightly favorable treatment condition (A), harms outweigh benefits in almost half the trial population (43%), despite overall results showing benefit on average. In the extremely favorable treatment condition (B), treatment remains worthwhile in virtually the entire population (97%). Thus, applying the risk modeling approach is very valuable in the low benefit condition, as it reclassifies many patients as treatment-unfavorable who would otherwise have been treated based on the overall result.
Figure 4.
Figure 4.. Schematized and Actual Risk-based Heterogeneous Treatment Effects
This figure schematically depicts outcome risks for a trial testing a hypothetical intervention with an odds ratio of 0.75 but with an absolute treatment-related harm of 1% (shown in the top panel). Observed odds ratios (middle panel) and risk differences (bottom panel) are shown. Overall trial results are dependent on the average risk of the enrolled trial population. When the average risk is ~7% (as above), a well-powered study would detect a positive overall treatment benefit (shown by the horizontal dashed line in the middle and bottom panels). However, a prediction model with a C-statistic of 0.75 generates the risk distribution at the top of the figure. A treatment-by-risk interaction emerges (middle panel). Whether or not this interaction is statistically significant, examination of treatment effects on the absolute risk difference scale (bottom panel) reveals harm in the low risk group and very substantial benefit in the high risk group, both of which are obscured by the overall summary results. Conventional one-variable-at-a-time subgroup analyses are typically inadequate to disaggregate patients into groups that are sufficiently heterogeneous for risk such that benefit-harm trade-offs can misleadingly appear to be consistent across the trial population. Baseline risk is logit normal distributed with mu=−3 and sigma=1 (the log odds are normally distributed). Figure adapted from Kent DM et al. JAMA 2007. The RITA-3 trial (N=1810) tested early intervention versus conservative management of non-ST-elevation acute coronary syndrome. Results for the outcome of death or non-fatal myocardial infarction at 5 years are shown above, stratified into equal-sized risk quarters using an internally-derived risk model; the highest risk quarter is sub-stratified in halves (groups 4a and 4b). Event rates with 95% confidence intervals (top panel), odds ratios (middle panel), and risk difference (bottom panel) are displayed. The risk model is comprised of the following easily obtainable clinical characteristics: age, sex, diabetes, prior MI, smoking status, heart rate, ST depression, angina severity, left bundle branch block, and treatment strategy. As in the schematic diagram to the left, the average treatment effect seen in the summary results (horizontal dashed line in middle and bottom panels) closely reflect the effect in patients in risk quarter 3, while fully half of patients (q1 and q2) receive no treatment benefit from early intervention. Absolute benefit (bottom panel) in the primary outcome was very pronounced in the eighth of patients at highest risk (4b). A statistically significant risk-by-treatment interaction* can be seen when results are expressed in the odds ratio scale (middle panel). Such a pattern can emerge if early intervention is associated with some procedure-related risks that are evenly distributed over all risk groups, eroding benefit in low risk but not high risk patients, as illustrated schematically in Figure 4A. *The interaction p value is from a likelihood ratio test for adding an interaction between the linear predictor of risk and treatment assignment (one degree of freedom).
Figure 5:
Figure 5:. Risk heterogeneity increases with higher discrimination – Extreme Quartile Risk Ratio Increases With Increasing C-Statistic, Especially at Low Outcome Rates
The curves above depict the relationship between the c-statistic and extreme quartile risk ratio (EQRR) – that is, the risk in the highest quartile compared to the risk in the lowest quartile – for different outcome rates across 32 trials. Unsurprisingly, the degree of risk heterogeneity (as represented by the EQRR) is strongly related to the discriminatory power of the prediction model. The relationship is strongest when the overall outcome rates are low. The c-statistic and EQRR both reflect how well the risk factors predict the outcome in a given population. For reference, in a trial with an outcome rate of 15%, a predictive model with a c-statistic of 0.80 is anticipated to yield an outcome rate that is 13-fold higher in the high risk quartile compared to the low risk quartile. When the outcome rate is lower (5%), this ratio is expected to be greater than 20-fold for a model with similar discrimination. Patient groups with such different outcome risks are unlikely to have similar benefit-harm trade-offs for most therapies, even thought they may be included in the same trial.
Figure 6:
Figure 6:. Evaluating Model Performance: A Comparison of Conventional Outcome Risk Calibration versus Treatment Effect (Benefit) Calibration
These data represent box plots of predicted and observed event rates by quartiles of predicted risk in the control and treatment arm of a hypothetical RCT (500 simulations; panel A). These rates appear to demonstrate appropriate model calibration. However, examining the same data for predicted and observed benefit (differences in event rates) by quarters of predicted benefit (panel B) reveals very poor model calibration at the extreme quarters. This poor calibration occurs because miscalibration for the risk difference includes error from both the control and treatment arm, and because the scale of the risk difference is much smaller than that for the outcome risk. These data was generated from a simulation of a prediction model that included 12 treatment effect interactions, 6 of which represented true interactions. The boxes represent, in line with Tukey’s definition, the 25% quantile to the 75% quantile (with the median shown). The lower and upper whiskers include the most extreme observations within the range of 1.5 times the interquartile range, from the 25% and 75% quantiles, respectively.

References

    1. Rothwell PM. Can overall results of clinical trials be applied to all patients? Lancet 1995; 345(8965):1616–1619. - PubMed
    1. Rothwell PM, Mehta Z, Howard SC, Gutnikov SA, Warlow CP. Treating individuals 3: from subgroups to individuals: general principles and the example of carotid endarterectomy. Lancet 2005; 365(9455):256–265. - PubMed
    1. Kent DM, Hayward RA. Limitations of applying summary results of clinical trials to individual patients: the need for risk stratification. JAMA 2007; 298(10):1209–1212. - PubMed
    1. Kent DM, Steyerberg EW, van Klaveren D. Personalized evidence-based medicine: predictive approaches to heterogeneous treatment effects. BMJ 2018; 363:k4245. - PMC - PubMed
    1. Kravitz RL, Duan N, Braslow J. Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q 2004; 82(4):661–687. - PMC - PubMed

Publication types