Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;30(10):2914-2923.
doi: 10.1038/s41591-024-03172-7. Epub 2024 Aug 7.

AI-based automation of enrollment criteria and endpoint assessment in clinical trials in liver diseases

Affiliations

AI-based automation of enrollment criteria and endpoint assessment in clinical trials in liver diseases

Janani S Iyer et al. Nat Med. 2024 Oct.

Abstract

Clinical trials in metabolic dysfunction-associated steatohepatitis (MASH, formerly known as nonalcoholic steatohepatitis) require histologic scoring for assessment of inclusion criteria and endpoints. However, variability in interpretation has impacted clinical trial outcomes. We developed an artificial intelligence-based measurement (AIM) tool for scoring MASH histology (AIM-MASH). AIM-MASH predictions for MASH Clinical Research Network necroinflammation grades and fibrosis stages were reproducible (κ = 1) and aligned with expert pathologist consensus scores (κ = 0.62-0.74). The AIM-MASH versus consensus agreements were comparable to average pathologists for MASH Clinical Research Network scores (82% versus 81%) and fibrosis (97% versus 96%). Continuous scores produced by AIM-MASH for key histological features of MASH correlated with mean pathologist scores and noninvasive biomarkers and strongly predicted progression-free survival in patients with stage 3 (P < 0.0001) and stage 4 (P = 0.03) fibrosis. In a retrospective analysis of the ATLAS trial (NCT03449446), responders receiving study treatment showed a greater continuous change in fibrosis compared with placebo (P = 0.02). Overall, these results suggest that AIM-MASH may assist pathologists in histologic review of MASH clinical trials, reducing inter-rater variability on trial outcomes and offering a more sensitive and reproducible measure of patient responses.

PubMed Disclaimer

Conflict of interest statement

A.N.B. is an employee of and holds stock in Gilead Sciences, Inc., and received study materials from PathAI, Inc. in support of this manuscript. A.D.B. serves as a consultant to 23andMe, Alimentiv, Allergan, Dialectica, PathAI, Inc., Source Bioscience and Verily, and is on Scientific Advisory Boards with 3Helix, Avacta and GSK. His institution has received funding for educational programs from Eli Lilly. A.H.B. is an employee of and holds stock in PathAI, Inc. A.P. is a former employee of, holds stock in and owns patents with PathAI, Inc. A.T.-W. is a former employee of and owns stock in PathAI, Inc. C.B.-S. is a former employee of and holds stock in PathAI, Inc. C.C. is an employee of Inipharm, a former employee of Gilead Sciences, Inc., and owns stock in Gilead Sciences, Inc. and Inipharm. D.J. is an employee of, holds stock in and owns patents with PathAI, Inc. H.E. is a former employee of and holds stock in PathAI, Inc., and is named on a patent (US 11527319) held by PathAI, Inc. H.P. is an employee of, owns stock in and owns patents with PathAI, Inc. I.W. is a former employee of and owns stock in PathAI, Inc., and owns a patent (US 10650520). J.G. is a former employee of and owns stock in PathAI, Inc., J.S.I. is a former employee of and owns stock in PathAI, Inc., and owns a patent. K.L. is a former employee of and owns stock in PathAI, Inc. and received an ISO grant while employed at PathAI, Inc. K.W. is a former employee of, owns stock in, received support for meeting attendance from and receives consulting fees from PathAI, Inc. M.L. is a former employee of and owns stock in PathAI, Inc. M.C.M. is a former employee of, holds stock in and receives financial support to attend meetings from PathAI, Inc.; holds stock in Bristol Myers Squibb; and holds a leadership position with the Digital Pathology Association. M.R. is a former employee of, owns stock in and receives consulting fees from PathAI, Inc. M.P. is a former employee of and holds stock in PathAI, Inc. O.C.-Z. is a former employee of and holds stock options in PathAI, Inc., and has a patent pending (US 20220245802A1). Q.L. is an employee of and owns stock in PathAI, Inc., and owns a patent. R.L. serves as a consultant to Aardvark Therapeutics, Altimmune, Anylam/Regeneron, Amgen, Arrowhead Pharmaceuticals, Astra Zeneca, Bristol Myers Squibb, CohBar, Eli Lilly, Galmed, Gilead Sciences, Inc., Glympse bio, Hightide, Inipharma, Intercept, Inventiva, Ionis, Janssen, Inc., Madrigal, Metacrine, Inc., NGM Biopharmaceuticals, Novartis, Novo Nordisk, Merck, Pfizer, Sagimet, Theratechnologies, 89bio, Terns Pharmaceuticals and Viking Therapeutics. In addition, his institutions received research grants from Arrowhead Pharmaceuticals, Astra Zeneca, Boehringer Ingelheim, Bristol Myers Squibb, Eli Lilly, Galectin Therapeutics, Galmed Pharmaceuticals, Gilead Sciences, Inc., Intercept, Hanmi, Inventiva, Ionis, Janssen, Inc., Madrigal Pharmaceuticals, Merck, NGM Biopharmaceuticals, Novo Nordisk, Pfizer, Sonic Incytes and Terns Pharmaceuticals. He is a co-founder of LipoNexus, Inc. R.P.M. is an employee of OrsoBio, Inc., and owns stock in OrsoBio, Inc. and Gilead Sciences, Inc. S.A.S.-M. is an employee of and owns stock in PathAI, Inc. R.E. is an employee of and owns stock in PathAI, Inc. S.H. is a former employee of, owns stock in and received support for meeting attendance from PathAI, Inc. S.D.P. is an employee of and holds stock in Gilead Sciences. T.R.W. is an employee of and holds stock in Gilead Sciences, Inc. Z.S. is an employee of and holds stock in PathAI, Inc., and owns a patent with PathAI, Inc. B.G. is an employee of, holds stock in and receives support for meeting attendance from PathAI, Inc. A.J.S. holds stock options in Genfit, Akarna, Tiziana, Durect, Inversago, Hemoshear, Northsea, Diapin, Liponexus and Galmed. In addition, he serves as a consultant to Astra Zeneca (<5 K), Terns (<5 K), Merck (<5 K), Boehringer Ingelheim (5–10 K), Lilly (5–10 K), Novartis (<5 K), Novo Nordisk (<5 K), Pfizer (<5 K), 89 Bio (<5 K), Regeneron (<5 K), Alnylam (<5 K), Akero (<5 K), Tern (<5 K), Histoindex (<5 K), Corcept (<5 K), PathAI (<5 K), Genfit (<5 K), Mediar (<5 K), Satellite Bio (<5 K), Echosens (<5 K), Abbott (<5 K), Promed (<5 K), Glaxo Smith Kline (∼11 K), Arrowhead (<5 K), Zydus (>60 K), Boston Pharmaceutical (<5 K), Myovent (<5 K), Variant (<5 K), Cascade (<5 K) and Northsea (<5 K), and his institution has received grant support from Gilead, Salix, Tobira, Bristol Myers, Shire, Intercept, Merck, Astra Zeneca, Mallinckrodt and Novartis. Lastly, he receives royalties from Elsevier and UpToDate.

Figures

Fig. 1
Fig. 1. Pipeline for model deployment.
a, Input: separate CNN-based models trained with digitized H&E- and MT-stained images annotated by expert pathologists are deployed on H&E- or MT-stained WSIs, respectively, to identify histological features. b, Artifact detection and exclusion: an artifact model, also based on CNNs, detects image and tissue artifacts for both H&E and MT WSIs and excludes them before downstream analysis. c, Image segmentation: H&E and MT CNNs segment and generate pixel-level predictions of relevant histologic features. d, AI-based MASH CRN scoring: CNN pixel-level predictions for each histological feature (for example, fibrosis or steatosis) were clustered using GNN models and a score predicted based on the spatial organization of the cluster. To correct for pathologists’ bias, the GNN models were specified as ‘mixed effects’ models, biases were learned and the GNNs were deployed with predictions using only the unbiased estimate. GNN nodes and edges were built from CNN predictions of relevant histologic features derived from deployment of the H&E, MT and artifact models. e, Output: this two-stage ML approach produced patient-level predictions of MASH CRN MAS component scores and fibrosis stage.
Fig. 2
Fig. 2. AI-based detection and scoring of MAS components and fibrosis.
The MASH algorithm can detect histopathologic features on WSIs across a range of MASH disease severity. a, Representative H&E-stained slides show AI overlays highlighting regions of steatosis, lobular inflammation and ballooning. Representative cases corresponding to MAS < 4 (total n = 148) and MAS ≥ 4 (total n = 483), according to both pathologist consensus scoring and AI in the test set, are shown. The inset is a magnified field showing the presence of the three MAS components. Scale bar, 0.2 mm. b, Representative MT-stained slides of each MASH CRN fibrosis stage show AI-generated overlays highlighting regions of fibrosis present on biopsies. Representative cases corresponding to MASH CRN fibrosis stages F1 (total n = 159), F2 (total n = 146), F3 (total n = 278) and F4 (total n = 23), according to both pathologist consensus scoring and AI in the test set, are shown. These AI-generated overlays allow for qualitative review of model performance. Scale bar, 0.5 mm.
Fig. 3
Fig. 3. AI-based grading/staging of enrollment criteria and efficacy endpoints.
a, Model-derived scores distinguished fibrosis stages F1–F3 versus F4 and MAS ≥ 4 (with each component grade ≥1) versus MAS < 4, criteria used to determine trial enrollment, using biopsies from the STELLAR-3 and STELLAR-4 clinical trials (n = 605). AIM-MASH agreement with consensus was comparable to that of each pathologist. Bar plots represent the point estimate of each enrollment criteria endpoint, and whiskers represent the 95% CIs estimated using 10,000 bootstrap samples. b, For assessment of efficacy endpoints commonly used in phase 2b and phase 3 MASH clinical trials, AIM-MASH agreement with consensus was comparable to that of an average pathologist. Assessment was performed on an external held-out validation dataset from a phase 2b MASH clinical trial using biopsies of patients meeting the following endpoints: fibrosis improvement without MASH worsening (n = 279), MASH resolution without fibrosis worsening (n = 279) and MAS reduction ≥2 (n = 326). Bar plots represent the point estimate of each enrollment criteria endpoint, and whiskers represent the 95% CIs estimated using 10,000 bootstrap samples.
Fig. 4
Fig. 4. AIM-based retrospective drug efficacy assessment.
AIM-MASH models were deployed on WSIs from baseline and week 48 biopsies from patients enrolled in the phase 2b ATLAS trial, which evaluated combination therapies for individuals with advanced MASH fibrosis. a, For the trial endpoints of MAS ≥ 2-point improvement, fibrosis improvement without worsening of MASH and MASH resolution without worsening of fibrosis, AIM-MASH models showed a greater proportion of responders compared with that determined by the trial central reader. For MAS ≥ 2-point improvement, odd ratios (ORs) for AI and central reader were 5.1 (95% CI 2.0–13.1) and 5.7 (95% CI 1.6–20.2), respectively; Cochran–Mantel–Haenszel (CMH) test statistics were 11.9 (P = 0.0006) and 7.9 (P = 0.005), respectively. For fibrosis improvement without worsening of MASH, ORs for AI and central reader were 2.2 (95% CI 0.7–6.3) and 2.2 (95% CI 0.6–7.7), respectively; CMH test statistics were 2.1 (P = 0.152) and 1.7 (P = 0.196), respectively. For MASH resolution without worsening of fibrosis, OR for AI was 2.7 (95% CI 0.8–8.8); OR for central reader was undefined, as no placebo responders were identified. CMH test statistics were 2.7 (P = 0.101) for AI and 2.0 (P = 0.155) for central reader. Sample sizes varied depending on data availability. b, The placebo-adjusted response rate detected by AIM-MASH was greater than that detected by the central reader.
Fig. 5
Fig. 5. AI-based continuous MASH CRN scores.
a, Correlation of AI-based continuous scores with mean scores across three pathologists from EMMINENCE in the analytic performance test set. Results are shown for both AI-derived ordinal bins (blue) and pathologist-derived ordinal bins (gray). Plotted values were derived from Kendall’s tau (τ) rank correlation analysis. FDR correction of P values was performed using the Benjamini–Hochberg procedure. Filled circles indicate statistical significance, FDR-corrected P < 0.05. b, cFib versus CPA measurements in primary endpoint responders in the ATLAS clinical trial. cFib and CPA were compared between patients receiving treatment and placebo using two-sided Mann–Whitney U tests. In primary endpoint responders, continuous fibrosis scores were significantly reduced in treated patients (n = 17) versus placebo patients (n = 6; Mann–Whitney U = 20.0, P = 0.02), while proportionate area fibrosis measurements were not significantly reduced (Mann–Whitney U = 39.0, P = 0.21). cFib and CPA values for patients classified as nonresponders (n = 76), in the treatment (n = 45) or placebo (n = 31) group, are also shown. Boxes represent the 25th percentile, median and 75th percentile of the data. Whiskers extend to points that lie within 1.5-fold of the inter-quartile range of the 25th and 75th percentiles. c, Stratification of patients with BL F3 or F4 fibrosis from STELLAR-3 and STELLAR-4 trial cohorts into rapid (red) and slow (orange) progressors based on continuous score cutoffs of 3.6 and 4.6, respectively. Kaplan–Meier and Cox proportional hazards regression analyses are shown. F3: log-rank statistic = 31.0, P = 2.6 × 10−8; F4: log-rank statistic = 4.8, P = 0.028. Rounded cutoffs were chosen to maximize hazards. d, Discriminatory accuracy of AI-derived continuous scores versus ordinal scores to predict progression to cirrhosis (left) and LRE (right) in STELLAR-3 and STELLAR-4 trial cohorts. In both cases, using receiver operating characteristic analysis, the continuous AUC was significantly greater (progression to cirrhosis: 0.66 (95% CI 0.60–0.71) versus 0.59 (95% CI 0.55–0.60); progression to LRE: 0.61 (95% CI 0.51–0.71) versus 0.54 (95% CI 0.47–59)). AUC, area under the receiver operating characteristic curve; BL, baseline; FDR, false discovery rate; FPR, false positive rate; τ, Kendall’s rank correlation coefficient for ordinal scores; TPR, true positive rate.
Extended Data Fig. 1
Extended Data Fig. 1. AIM-MASH CNN and GNN model training and predictions.
CNN, convolutional neural network; GNN, graph neural network; AIM, Intelligence based Measurement; MASH, metabolic dysfunction-associated steatohepatitis.
Extended Data Fig. 2
Extended Data Fig. 2. AIM-MASH H&E and Trichrome inference pipelines.
H&E, hematoxylin and eosin; TC, trichrome.
Extended Data Fig. 3
Extended Data Fig. 3. Segmentation model development process.
QC, quality control.
Extended Data Fig. 4
Extended Data Fig. 4. Mapping of continuous scores.
Continuous MAS (steatosis, ballooning and lobular inflammation) and CRN fibrosis scores were produced by mapping GNN-derived ordinal grades/stages to bins, such that ordinal scores are spread over a continuous range (unit distance of 1). This continuous scoring system allows a more granular measurement of histological changes occurring at the subordinal level, while maintaining fidelity to the accepted ordinal scoring system.

References

    1. Zhai, M. et al. The incidence trends of liver cirrhosis caused by nonalcoholic steatohepatitis via the GBD study 2017. Sci. Rep.11, 5195 (2021). - PMC - PubMed
    1. Younossi, Z. M. et al. Burden of illness and economic model for patients with nonalcoholic steatohepatitis in the United States. Hepatology69, 564–572 (2019). - PubMed
    1. Kingwell, K. NASH field celebrates ‘hurrah moment’ with a first FDA drug approval for the liver disease. Nat. Rev. Drug Discov.23, 235–237 (2024). - PubMed
    1. Naoumov, N. V. et al. Digital pathology with artificial intelligence analyses provides greater insights into treatment-induced fibrosis regression in NASH. J. Hepatol.77, 1399–1409 (2022). - PubMed
    1. Taylor-Weiner, A. et al. A machine learning approach enables quantitative measurement of liver histology and disease monitoring in NASH. Hepatology74, 133–147 (2021). - PMC - PubMed