Q-Finder: An Algorithm for Credible Subgroup Discovery in Clinical Data Analysis - An Application to the International Diabetes Management Practice Study

Cyril Esnault¹, May-Line Gadonna¹, Maxence Queyrel^{1

2}, Alexandre Templier¹, Jean-Daniel Zucker^{2

3}

Affiliations

¹ Quinten France, Paris, France.
² Sorbonne University, IRD, UMMISCO, Bondy, France.
³ Sorbonne University, INSERM, NUTRIOMICS, Paris, France.

PMID: 33733209
PMCID: PMC7861304
DOI: 10.3389/frai.2020.559927

Q-Finder: An Algorithm for Credible Subgroup Discovery in Clinical Data Analysis - An Application to the International Diabetes Management Practice Study

Cyril Esnault et al. Front Artif Intell. 2020.

. 2020 Dec 17:3:559927.

doi: 10.3389/frai.2020.559927. eCollection 2020.

Authors

Cyril Esnault¹, May-Line Gadonna¹, Maxence Queyrel^{1

2}, Alexandre Templier¹, Jean-Daniel Zucker^{2

3}

Affiliations

¹ Quinten France, Paris, France.
² Sorbonne University, IRD, UMMISCO, Bondy, France.
³ Sorbonne University, INSERM, NUTRIOMICS, Paris, France.

PMID: 33733209
PMCID: PMC7861304
DOI: 10.3389/frai.2020.559927

Abstract

Addressing the heterogeneity of both the outcome of a disease and the treatment response to an intervention is a mandatory pathway for regulatory approval of medicines. In randomized clinical trials (RCTs), confirmatory subgroup analyses focus on the assessment of drugs in predefined subgroups, while exploratory ones allow a posteriori the identification of subsets of patients who respond differently. Within the latter area, subgroup discovery (SD) data mining approach is widely used-particularly in precision medicine-to evaluate treatment effect across different groups of patients from various data sources (be it from clinical trials or real-world data). However, both the limited consideration by standard SD algorithms of recommended criteria to define credible subgroups and the lack of statistical power of the findings after correcting for multiple testing hinder the generation of hypothesis and their acceptance by healthcare authorities and practitioners. In this paper, we present the Q-Finder algorithm that aims to generate statistically credible subgroups to answer clinical questions, such as finding drivers of natural disease progression or treatment response. It combines an exhaustive search with a cascade of filters based on metrics assessing key credibility criteria, including relative risk reduction assessment, adjustment on confounding factors, individual feature's contribution to the subgroup's effect, interaction tests for assessing between-subgroup treatment effect interactions and tests adjustment (multiple testing). This allows Q-Finder to directly target and assess subgroups on recommended credibility criteria. The top-k credible subgroups are then selected, while accounting for subgroups' diversity and, possibly, clinical relevance. Those subgroups are tested on independent data to assess their consistency across databases, while preserving statistical power by limiting the number of tests. To illustrate this algorithm, we applied it on the database of the International Diabetes Management Practice Study (IDMPS) to better understand the drivers of improved glycemic control and rate of episodes of hypoglycemia in type 2 diabetics patients. We compared Q-Finder with state-of-the-art approaches from both Subgroup Identification and Knowledge Discovery in Databases literature. The results demonstrate its ability to identify and support a short list of highly credible and diverse data-driven subgroups for both prognostic and predictive tasks.

Keywords: IDMPS; credibility criteria; exploratory subgroup analysis; hypothesis generation; precision medicine; predictive factor; prognostic factor; subgroup discovery.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests. Employment: CE, MG, MQ, and AT are employed by Quinten. Financial support: The development of Q-Finder has been fully funded by Quinten. This article has been funded by Quinten with the help of Sanofi who provided the dataset, and contributed to the revisions.

Figures

**FIGURE 1**
A classification of SA tasks distinguishing the confirmatory analyses **(left)** from the exploratory ones **(right)**.

**FIGURE 2**
Hierarchical tree representing the two layers classification of SA tasks and criteria used.

**FIGURE 3**
Hierarchical tree representing the SD approaches in both biomedical data analysis and data mining cultures. The references under the boxes correspond to representative algorithms of each kind.

**FIGURE 4**
Q-Finder works in 4 main stages: an exhaustive generation of candidate subgroups, a ranking of candidate subgroups via an evaluation of their empirical credibility, a selection of the best candidates (taking into account the redundancy between subgroups), and then an assessment of subgroups’ credibility on one or more test datasets.

See this image and copyright information in PMC

References

1. Adam J., Sourisseau T., Olaussen K. A., Robin A., Zhu C. Q., Templier A., et al. (2016). MMS19 as a potential predictive marker of adjuvant chemotherapy benefit in resected non-small cell lung cancer, Cancer Biomark. 17, 323–333. 10.3233/CBM-160644 - DOI - PubMed
1. Adolfsson J., Steineck G. (2000). Prognostic and treatment-predictive factors-is there a difference? Prost. Cancer Prost. Dis. 3, 265–268. 10.1038/sj.pcan.4500490 - DOI - PubMed
1. Alomar M. J., Al-Ansari K. R., Hassan N. A. (2019). Comparison of awareness of diabetes mellitus type II with treatment’s outcome in term of direct cost in a hospital in Saudi Arabia. World J. Diabetes 10, 463–472 10.4239/wjd.v10.i8.463 - DOI - PMC - PubMed
1. Alves A., Civet A., Laurent A., Parc Y., Penna Y., Msika S., et al. (2020). Social deprivation aggravates post-operative morbidity in carcinologic colorectal surgery: results of the COINCIDE multicenter study. J. Visceral Surg. 140(3), 278 10.1016/j.jviscsurg.2020.07.007 - DOI - PubMed
1. American Diabetes Association. (2016). 6. Glycemic targets. Diabetes Care 40, 1935–5548. 10.2337/dc17-S009 - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Q-Finder: An Algorithm for Credible Subgroup Discovery in Clinical Data Analysis - An Application to the International Diabetes Management Practice Study

Affiliations

Q-Finder: An Algorithm for Credible Subgroup Discovery in Clinical Data Analysis - An Application to the International Diabetes Management Practice Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources