Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 9;6(2):ooad032.
doi: 10.1093/jamiaopen/ooad032. eCollection 2023 Jul.

A metadata framework for computational phenotypes

Affiliations

A metadata framework for computational phenotypes

Matthew Spotnitz et al. JAMIA Open. .

Abstract

With the burgeoning development of computational phenotypes, it is increasingly difficult to identify the right phenotype for the right tasks. This study uses a mixed-methods approach to develop and evaluate a novel metadata framework for retrieval of and reusing computational phenotypes. Twenty active phenotyping researchers from 2 large research networks, Electronic Medical Records and Genomics and Observational Health Data Sciences and Informatics, were recruited to suggest metadata elements. Once consensus was reached on 39 metadata elements, 47 new researchers were surveyed to evaluate the utility of the metadata framework. The survey consisted of 5-Likert multiple-choice questions and open-ended questions. Two more researchers were asked to use the metadata framework to annotate 8 type-2 diabetes mellitus phenotypes. More than 90% of the survey respondents rated metadata elements regarding phenotype definition and validation methods and metrics positively with a score of 4 or 5. Both researchers completed annotation of each phenotype within 60 min. Our thematic analysis of the narrative feedback indicates that the metadata framework was effective in capturing rich and explicit descriptions and enabling the search for phenotypes, compliance with data standards, and comprehensive validation metrics. Current limitations were its complexity for data collection and the entailed human costs.

Keywords: electronic health records; metadata; phenotype.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
(A) Box plot of background section metadata element ratings. a = What is the phenotype definition?; b = Does the phenotype definition specify what patients will be identified, anyone currently or previously with the phenotype or newly diagnosed with the phenotype?; c = Does the definition specify the clinical setting for phenotype diagnosis (ie, inpatient)?; d = How has the phenotype been adopted?; e = When was the phenotype last updated?; f = Corresponding author contact information; g = Prior publication Pubmed ID; h = Has the phenotype been published in a phenotype library?; i = Did the investigators have clinical expertise?; j = Did the investigators have informatics expertise? (B) Box plot of algorithm section metadata element ratings. a = What was the data source (ie, EHR, claims)?; b = Were the source data structured (ie, CDM)?; c = Were the source data semi-structured (ie, problem list)?; d = Were the source data unstructured (ie, free text)?; e = Were the source data grouped by terminologies (ie, ICD-09/10)?; f = What data domains were used in the phenotype (ie, conditions, procedures)?; g = Was the phenotype rule based?; h = Was the phenotype machine learning based?; i = Was the phenotype natural language processing based?; j = Did the algorithm identify subtypes of the phenotype? (C) Box plot of performance section metadata element ratings. a = What method was used for validating the phenotype (ie, chart review)?; b = What was the validation population?; c = What was the phenotype prevalence?; d = What were the validation guidelines?; e = What was the definition of the validation phenotype?; f = sensitivity; g = specificity; h = negative predictive value (NPV); i = positive predictive value (PPV); j = Did most patients fulfill the phenotype criteria at similar points in their disease course?; k = Did most patients who fulfilled the phenotype criteria have similar disease presentations?; l = Did patients with new (incident) cases of the disease fulfill the phenotype criteria?; m = Did patients with chronic (prevalent) cases of the disease fulfill the phenotype criteria?; n = polygenic score (PGS); o = Other. (D) Box plot of limitations section metadata element ratings. a = Did the phenotype lose a substantial amount information from the source data?; b = Can the phenotype be generalized to many populations other than the source and/or validation populations?; c = How do you envision using the phenotype (ie, clinical trial recruitment, clinical or public health study, translational or genetic study, clinical decision support)?; d = Other.

References

    1. Newton KM, Peissig PL, Kho AN, et al.Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc 2013; 20 (e1): e147–54. - PMC - PubMed
    1. Hripcsak G, Shang N, Peissig PL, et al.Facilitating phenotype transfer using a common data model. J Biomed Inform 2019; 96: 103253. - PMC - PubMed
    1. McCarty CA, Chisholm RL, Chute CG, et al.The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics 2011; 4: 13. - PMC - PubMed
    1. Kirby JC, Speltz P, Rasmussen LV, et al.PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc 2016; 23 (6): 1046–52. - PMC - PubMed
    1. Mo H, Thompson WK, Rasmussen LV, et al.Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc 2015; 22 (6): 1220–30. - PMC - PubMed

LinkOut - more resources