Validating a strategy for psychosocial phenotyping using a large corpus of clinical text
- PMID: 24169276
- PMCID: PMC3861921
- DOI: 10.1136/amiajnl-2013-001946
Validating a strategy for psychosocial phenotyping using a large corpus of clinical text
Abstract
Objective: To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts.
Materials and methods: From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized.
Results: A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6-0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%).
Conclusions: Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype.
Keywords: clinical informatics; high through-put; natural language processing; patient phenotype; psychosocial concepts.
Figures





Similar articles
-
Recognizing Questions and Answers in EMR Templates Using Natural Language Processing.Stud Health Technol Inform. 2014;202:149-52. Stud Health Technol Inform. 2014. PMID: 25000038
-
Detecting the presence of an indwelling urinary catheter and urinary symptoms in hospitalized patients using natural language processing.J Biomed Inform. 2017 Jul;71S:S39-S45. doi: 10.1016/j.jbi.2016.07.012. Epub 2016 Jul 9. J Biomed Inform. 2017. PMID: 27404849
-
Ensembles of natural language processing systems for portable phenotyping solutions.J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23. J Biomed Inform. 2019. PMID: 31655273 Free PMC article.
-
Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies.J Biomed Semantics. 2020 Nov 16;11(1):14. doi: 10.1186/s13326-020-00231-z. J Biomed Semantics. 2020. PMID: 33198814 Free PMC article.
-
Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.J Am Med Inform Assoc. 2019 Apr 1;26(4):364-379. doi: 10.1093/jamia/ocy173. J Am Med Inform Assoc. 2019. PMID: 30726935 Free PMC article.
Cited by
-
v3NLP Framework: Tools to Build Applications for Extracting Concepts from Clinical Text.EGEMS (Wash DC). 2016 Aug 11;4(3):1228. doi: 10.13063/2327-9214.1228. eCollection 2016. EGEMS (Wash DC). 2016. PMID: 27683667 Free PMC article.
-
Bootstrapping semi-supervised annotation method for potential suicidal messages.Internet Interv. 2022 Feb 28;28:100519. doi: 10.1016/j.invent.2022.100519. eCollection 2022 Apr. Internet Interv. 2022. PMID: 35281704 Free PMC article. Review.
-
Using electronic health record metadata to predict housing instability amongst veterans.Prev Med Rep. 2023 Nov 24;37:102505. doi: 10.1016/j.pmedr.2023.102505. eCollection 2024 Jan. Prev Med Rep. 2023. PMID: 38261912 Free PMC article.
-
'Big data' in mental health research: current status and emerging possibilities.Soc Psychiatry Psychiatr Epidemiol. 2016 Aug;51(8):1055-72. doi: 10.1007/s00127-016-1266-8. Epub 2016 Jul 27. Soc Psychiatry Psychiatr Epidemiol. 2016. PMID: 27465245 Free PMC article. Review.
-
Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.Yearb Med Inform. 2015 Aug 13;10(1):183-93. doi: 10.15265/IY-2015-009. Yearb Med Inform. 2015. PMID: 26293867 Free PMC article. Review.
References
-
- Balshem H, Christensen V, Tuepker A, et al. A critical review of the literature regarding homelessness among veterans. In: US Department of Veterans Affairs, ed. A critical review of the literature regarding homelessness among veterans. Washington, DC: US Department of Veterans Affairs, 2011:9–43 - PubMed
-
- Lin A, Wood SJ, Yung AR. Measuring psychosocial outcome is good. Curr Opin Psychiatry 2013;26:138–43 - PubMed
-
- Barth J, Schneider S, von Kanel R. Lack of social support in the etiology and the prognosis of coronary heart disease: a systematic review and meta-analysis. Psychosom Med 2010;72:229–38 - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials