. 2016 Apr:60:66-76.

doi: 10.1016/j.jbi.2016.01.007. Epub 2016 Jan 25.

Multivariate analysis of the population representativeness of related clinical studies

Zhe He¹, Patrick Ryan², Julia Hoxha³, Shuang Wang⁴, Simona Carini⁵, Ida Sim⁵, Chunhua Weng⁶

Affiliations

¹ Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA. Electronic address: zh2132@columbia.edu.
² Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; Janssen Research and Development, Titusville, NJ 08560, USA; Observational Health Data Sciences and Informatics, New York, NY 10032, USA.
³ Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
⁴ Department of Biostatistics, Columbia University, New York, NY 10032, USA.
⁵ Department of Medicine, University of California, San Francisco, CA 94143, USA.
⁶ Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; Observational Health Data Sciences and Informatics, New York, NY 10032, USA.

PMID: 26820188
PMCID: PMC4837055
DOI: 10.1016/j.jbi.2016.01.007

Multivariate analysis of the population representativeness of related clinical studies

Zhe He et al. J Biomed Inform. 2016 Apr.

. 2016 Apr:60:66-76.

doi: 10.1016/j.jbi.2016.01.007. Epub 2016 Jan 25.

Authors

Zhe He¹, Patrick Ryan², Julia Hoxha³, Shuang Wang⁴, Simona Carini⁵, Ida Sim⁵, Chunhua Weng⁶

Affiliations

¹ Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA. Electronic address: zh2132@columbia.edu.
² Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; Janssen Research and Development, Titusville, NJ 08560, USA; Observational Health Data Sciences and Informatics, New York, NY 10032, USA.
³ Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA.
⁴ Department of Biostatistics, Columbia University, New York, NY 10032, USA.
⁵ Department of Medicine, University of California, San Francisco, CA 94143, USA.
⁶ Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; Observational Health Data Sciences and Informatics, New York, NY 10032, USA.

PMID: 26820188
PMCID: PMC4837055
DOI: 10.1016/j.jbi.2016.01.007

Abstract

Objective: To develop a multivariate method for quantifying the population representativeness across related clinical studies and a computational method for identifying and characterizing underrepresented subgroups in clinical studies.

Methods: We extended a published metric named Generalizability Index for Study Traits (GIST) to include multiple study traits for quantifying the population representativeness of a set of related studies by assuming the independence and equal importance among all study traits. On this basis, we compared the effectiveness of GIST and multivariate GIST (mGIST) qualitatively. We further developed an algorithm called "Multivariate Underrepresented Subgroup Identification" (MAGIC) for constructing optimal combinations of distinct value intervals of multiple traits to define underrepresented subgroups in a set of related studies. Using Type 2 diabetes mellitus (T2DM) as an example, we identified and extracted frequently used quantitative eligibility criteria variables in a set of clinical studies. We profiled the T2DM target population using the National Health and Nutrition Examination Survey (NHANES) data.

Results: According to the mGIST scores for four example variables, i.e., age, HbA1c, BMI, and gender, the included observational T2DM studies had superior population representativeness than the interventional T2DM studies. For the interventional T2DM studies, Phase I trials had better population representativeness than Phase III trials. People at least 65years old with HbA1c value between 5.7% and 7.2% were particularly underrepresented in the included T2DM trials. These results confirmed well-known knowledge and demonstrated the effectiveness of our methods in population representativeness assessment.

Conclusions: mGIST is effective at quantifying population representativeness of related clinical studies using multiple numeric study traits. MAGIC identifies underrepresented subgroups in clinical studies. Both data-driven methods can be used to improve the transparency of design bias in participation selection at the research community level.

Keywords: Clinical trial; Knowledge representation; Selection bias.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTERESTS

None.

Figures

**Figure 1**
The workflow for multivariate analysis of population representativeness of related clinical studies.

**Figure 2**
Pipeline of Multivariate Underrepresented Subgroup Identification (MAGIC).

**Figure 3**
(a) Visualization of the distribution of the real-world T2DM patients with their eligibility for 3,158 T2DM studies with respect to age, HbA1c, BMI, and gender jointly. The x-axis represents HbA1c value intervals. The y-axis represents BMI value intervals. The z-axis represents age value intervals. Each dot represents patients with the same set of characteristics. The size of every dot is proportional to the number of real-world patients (normalized sample weight “WTMEC10YR”) that each dot represents. The color of a dot represents the percentage of studies for which each sample satisfy all the variables, scaled such that red indicates the highest proportion of studies and blue indicates the lowest observed proportion of studies. Regions in blue highlight target populations that are systematically underrepresented across all the studies. The six transparent boxes represent the top six underrepresented female subgroups identified by the MAGIC algorithm; (b) A different orientation of the figure showing age and BMI; (c) A different orientation of the figure showing age and HbA1c; (d) A different orientation of the figure showing BMI and HbA1c. We provide the MATLAB figure file as a supplementary material. One can open the file in MATLAB and change the orientation of the figure and view it from different angles

**Figure 4**
Percentage of T2DM patients who satisfy four criteria of interventional T2DM studies.

See this image and copyright information in PMC

Cited by

Assessing the use of prescription drugs and dietary supplements in obese respondents in the National Health and Nutrition Examination Survey.
Barrett LA, Xing A, Sheffler J, Steidley E, Adam TJ, Zhang R, He Z. Barrett LA, et al. PLoS One. 2022 Jun 3;17(6):e0269241. doi: 10.1371/journal.pone.0269241. eCollection 2022. PLoS One. 2022. PMID: 35657782 Free PMC article.
Clinical Trial Generalizability Assessment in the Big Data Era: A Review.
He Z, Tang X, Yang X, Guo Y, George TJ, Charness N, Quan Hem KB, Hogan W, Bian J. He Z, et al. Clin Transl Sci. 2020 Jul;13(4):675-684. doi: 10.1111/cts.12764. Epub 2020 Apr 10. Clin Transl Sci. 2020. PMID: 32058639 Free PMC article.
Assessing the Validity of a a priori Patient-Trial Generalizability Score using Real-world Data from a Large Clinical Data Research Network: A Colorectal Cancer Clinical Trial Case Study.
Li Q, He Z, Guo Y, Zhang H, George TJ, Hogan W, Charness N, Bian J. Li Q, et al. AMIA Annu Symp Proc. 2020 Mar 4;2019:1101-1110. eCollection 2019. AMIA Annu Symp Proc. 2020. PMID: 32308907 Free PMC article.
Novel stepwise approach to assess representativeness of a large multicenter observational cohort of tuberculosis patients: The example of RePORT Brazil.
Arriaga MB, Amorim G, Queiroz ATL, Rodrigues MMS, Araújo-Pereira M, Nogueira BMF, Souza AB, Rocha MS, Benjamin A, Moreira ASR, de Oliveira JG, Figueiredo MC, Turner MM, Alves K, Durovni B, Lapa-E-Silva JR, Kritski AL, Cavalcante S, Rolla VC, Cordeiro-Santos M, Sterling TR, Andrade BB; RePORT Brazil consortium. Arriaga MB, et al. Int J Infect Dis. 2021 Feb;103:110-118. doi: 10.1016/j.ijid.2020.11.140. Epub 2020 Nov 14. Int J Infect Dis. 2021. PMID: 33197582 Free PMC article.
Computational strategic recruitment for representation and coverage studied in the All of Us Research Program.
Borza VA, Chen Q, Clayton EW, Kantarcioglu M, Sulieman L, Vorobeychik Y, Malin BA. Borza VA, et al. NPJ Digit Med. 2025 Jul 3;8(1):402. doi: 10.1038/s41746-025-01804-x. NPJ Digit Med. 2025. PMID: 40610586 Free PMC article.

See all "Cited by" articles

References

1. From the NIH Director: The Importance of Clinical Trials. 2014 Apr 9; Available from: http://www.nlm.nih.gov/medlineplus/magazine/issues/summer11/articles/sum....
1. Filion M, Forget G, Brochu O, Provencher L, Desbiens C, Doyle C, et al. Eligibility criteria in randomized phase II and III adjuvant and neoadjuvant breast cancer trials: not a significant barrier to enrollment. Clin Trials. 9(5):652–9. - PubMed
1. Weisberg HI, Hayden VC, Pontes VP. Selection criteria and generalizability within the counterfactual framework: explaining the paradox of antidepressant-induced suicidality? Clin Trials. 2009;6(2):109–18. - PubMed
1. Rothwell PM. External validity of randomised controlled trials: “to whom do the results of this trial apply?”. Lancet. 2005;365(9453):82–93. - PubMed
1. Leaf C. The New York Times. 2013. Do Clinical Trials Work?

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multivariate analysis of the population representativeness of related clinical studies

Affiliations

Multivariate analysis of the population representativeness of related clinical studies

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources