A metadata framework for computational phenotypes

Matthew Spotnitz¹, Nripendra Acharya¹, James J Cimino², Shawn Murphy^{3

4}, Bahram Namjou⁵, Nancy Crimmins⁵, Theresa Walunas⁶, Cong Liu¹, David Crosslin⁷, Barbara Benoit⁸, Elisabeth Rosenthal⁹, Jennifer A Pacheco¹⁰, Anna Ostropolets¹, Harry Reyes Nieva¹, Jason S Patterson¹, Lauren R Richter¹, Tiffany J Callahan¹, Ahmed Elhussein¹, Chao Pang¹, Krzysztof Kiryluk¹¹, Jordan Nestor¹¹, Atlas Khan¹¹, Sumit Mohan^{11

12}, Evan Minty¹³, Wendy Chung¹⁴, Wei-Qi Wei¹⁵, Karthik Natarajan¹, Chunhua Weng¹

Affiliations

¹ Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA.
² Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA.
³ Laboratory of Computer Science, Mass General Brigham, Boston, Massachusetts, USA.
⁴ Department of Neurology, Mass General Brigham, Boston, Massachusetts, USA.
⁵ Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
⁶ Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.
⁷ Division of Biomedical Informatics and Genomics, Tulane University School of Medicine, New Orleans, Louisiana, USA.
⁸ Department of Research Information Science & Computing, Mass General Brigham, Boston, Massachusetts, USA.
⁹ Division of Genetics, University of Washington, Seattle, Washington, USA.
¹⁰ Center for Genetic Medicine, Northwestern University, Chicago, Illinois, USA.
¹¹ Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA.
¹² Department of Epidemiology, Columbia University Mailman School of Public Health, New York, New York, USA.
¹³ Department of Medicine, University of Calgary, Calgary, Alberta, Canada.
¹⁴ Department of Pediatrics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA.
¹⁵ Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA.

PMID: 37181728
PMCID: PMC10168627
DOI: 10.1093/jamiaopen/ooad032

A metadata framework for computational phenotypes

Matthew Spotnitz et al. JAMIA Open. 2023.

. 2023 May 9;6(2):ooad032.

doi: 10.1093/jamiaopen/ooad032. eCollection 2023 Jul.

Authors

Affiliations

¹ Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA.
² Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA.
³ Laboratory of Computer Science, Mass General Brigham, Boston, Massachusetts, USA.
⁴ Department of Neurology, Mass General Brigham, Boston, Massachusetts, USA.
⁵ Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.
⁶ Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA.
⁷ Division of Biomedical Informatics and Genomics, Tulane University School of Medicine, New Orleans, Louisiana, USA.
⁸ Department of Research Information Science & Computing, Mass General Brigham, Boston, Massachusetts, USA.
⁹ Division of Genetics, University of Washington, Seattle, Washington, USA.
¹⁰ Center for Genetic Medicine, Northwestern University, Chicago, Illinois, USA.
¹¹ Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA.
¹² Department of Epidemiology, Columbia University Mailman School of Public Health, New York, New York, USA.
¹³ Department of Medicine, University of Calgary, Calgary, Alberta, Canada.
¹⁴ Department of Pediatrics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA.
¹⁵ Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA.

PMID: 37181728
PMCID: PMC10168627
DOI: 10.1093/jamiaopen/ooad032

Abstract

With the burgeoning development of computational phenotypes, it is increasingly difficult to identify the right phenotype for the right tasks. This study uses a mixed-methods approach to develop and evaluate a novel metadata framework for retrieval of and reusing computational phenotypes. Twenty active phenotyping researchers from 2 large research networks, Electronic Medical Records and Genomics and Observational Health Data Sciences and Informatics, were recruited to suggest metadata elements. Once consensus was reached on 39 metadata elements, 47 new researchers were surveyed to evaluate the utility of the metadata framework. The survey consisted of 5-Likert multiple-choice questions and open-ended questions. Two more researchers were asked to use the metadata framework to annotate 8 type-2 diabetes mellitus phenotypes. More than 90% of the survey respondents rated metadata elements regarding phenotype definition and validation methods and metrics positively with a score of 4 or 5. Both researchers completed annotation of each phenotype within 60 min. Our thematic analysis of the narrative feedback indicates that the metadata framework was effective in capturing rich and explicit descriptions and enabling the search for phenotypes, compliance with data standards, and comprehensive validation metrics. Current limitations were its complexity for data collection and the entailed human costs.

Keywords: electronic health records; metadata; phenotype.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
(A) Box plot of background section metadata element ratings. a = What is the phenotype definition?; b = Does the phenotype definition specify what patients will be identified, anyone currently or previously with the phenotype or newly diagnosed with the phenotype?; c = Does the definition specify the clinical setting for phenotype diagnosis (ie, inpatient)?; d = How has the phenotype been adopted?; e = When was the phenotype last updated?; f = Corresponding author contact information; g = Prior publication Pubmed ID; h = Has the phenotype been published in a phenotype library?; i = Did the investigators have clinical expertise?; j = Did the investigators have informatics expertise? (B) Box plot of algorithm section metadata element ratings. a = What was the data source (ie, EHR, claims)?; b = Were the source data structured (ie, CDM)?; c = Were the source data semi-structured (ie, problem list)?; d = Were the source data unstructured (ie, free text)?; e = Were the source data grouped by terminologies (ie, ICD-09/10)?; f = What data domains were used in the phenotype (ie, conditions, procedures)?; g = Was the phenotype rule based?; h = Was the phenotype machine learning based?; i = Was the phenotype natural language processing based?; j = Did the algorithm identify subtypes of the phenotype? (C) Box plot of performance section metadata element ratings. a = What method was used for validating the phenotype (ie, chart review)?; b = What was the validation population?; c = What was the phenotype prevalence?; d = What were the validation guidelines?; e = What was the definition of the validation phenotype?; f = sensitivity; g = specificity; h = negative predictive value (NPV); i = positive predictive value (PPV); j = Did most patients fulfill the phenotype criteria at similar points in their disease course?; k = Did most patients who fulfilled the phenotype criteria have similar disease presentations?; l = Did patients with new (incident) cases of the disease fulfill the phenotype criteria?; m = Did patients with chronic (prevalent) cases of the disease fulfill the phenotype criteria?; n = polygenic score (PGS); o = Other. (D) Box plot of limitations section metadata element ratings. a = Did the phenotype lose a substantial amount information from the source data?; b = Can the phenotype be generalized to many populations other than the source and/or validation populations?; c = How do you envision using the phenotype (ie, clinical trial recruitment, clinical or public health study, translational or genetic study, clinical decision support)?; d = Other.

See this image and copyright information in PMC

References

1. Newton KM, Peissig PL, Kho AN, et al. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc 2013; 20 (e1): e147–54. - PMC - PubMed
1. Hripcsak G, Shang N, Peissig PL, et al. Facilitating phenotype transfer using a common data model. J Biomed Inform 2019; 96: 103253. - PMC - PubMed
1. McCarty CA, Chisholm RL, Chute CG, et al. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics 2011; 4: 13. - PMC - PubMed
1. Kirby JC, Speltz P, Rasmussen LV, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc 2016; 23 (6): 1046–52. - PMC - PubMed
1. Mo H, Thompson WK, Rasmussen LV, et al. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc 2015; 22 (6): 1220–30. - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A metadata framework for computational phenotypes

Affiliations

A metadata framework for computational phenotypes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources