Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders

Natalie K Fitzpatrick¹, Richard Dobson², Angus Roberts², Kerina Jones³, Anoop D Shah^{1

4}, Goran Nenadic⁵, Elizabeth Ford⁶

Affiliations

¹ Institute of Health Informatics, University College London, London, United Kingdom.
² Department of Biostatistics and Health Informatics, King's College London, London, United Kingdom.
³ Department of Population Data Science, Swansea University Medical School, Swansea, United Kingdom.
⁴ University College London Hospitals NHS Foundation Trust, London, United Kingdom.
⁵ Department of Computer Science, University of Manchester, Manchester, United Kingdom.
⁶ Brighton and Sussex Medical School, Brighton, United Kingdom.

PMID: 37133927
PMCID: PMC10193205
DOI: 10.2196/45534

Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders

Natalie K Fitzpatrick et al. JMIR Med Inform. 2023.

. 2023 May 3:11:e45534.

doi: 10.2196/45534.

Authors

Natalie K Fitzpatrick¹, Richard Dobson², Angus Roberts², Kerina Jones³, Anoop D Shah^{1

4}, Goran Nenadic⁵, Elizabeth Ford⁶

Affiliations

¹ Institute of Health Informatics, University College London, London, United Kingdom.
² Department of Biostatistics and Health Informatics, King's College London, London, United Kingdom.
³ Department of Population Data Science, Swansea University Medical School, Swansea, United Kingdom.
⁴ University College London Hospitals NHS Foundation Trust, London, United Kingdom.
⁵ Department of Computer Science, University of Manchester, Manchester, United Kingdom.
⁶ Brighton and Sussex Medical School, Brighton, United Kingdom.

PMID: 37133927
PMCID: PMC10193205
DOI: 10.2196/45534

Abstract

Background: Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose.

Objective: This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community.

Methods: Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers).

Results: All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank.

Conclusions: These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery.

Keywords: consent; databank; electronic health records; free text; governance; natural language processing; public involvement; unstructured text.

©Natalie K Fitzpatrick, Richard Dobson, Angus Roberts, Kerina Jones, Anoop D Shah, Goran Nenadic, Elizabeth Ford. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 03.05.2023.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

References

1. Jones KH, Ford EM, Lea N, Griffiths LJ, Hassan L, Heys S, Squires E, Nenadic G. Toward the development of data governance standards for using clinical free-text data in health research: position paper. J Med Internet Res. 2020 Jun 29;22(6):e16760. doi: 10.2196/16760. https://www.jmir.org/2020/6/e16760/ v22i6e16760 - DOI - PMC - PubMed
1. Ford E, Curlewis K, Squires E, Griffiths LJ, Stewart R, Jones KH. The potential of research drawing on clinical free text to bring benefits to patients in the United Kingdom: a systematic review of the literature. Front Digit Health. 2021;3:606599. doi: 10.3389/fdgth.2021.606599. https://europepmc.org/abstract/MED/34713089 - DOI - PMC - PubMed
1. Dong H, Falis M, Whiteley W, Alex B, Matterson J, Ji S, Chen J, Wu H. Automated clinical coding: what, why, and where we are? NPJ Digit Med. 2022 Oct 22;5(1):159. doi: 10.1038/s41746-022-00705-7. doi: 10.1038/s41746-022-00705-7.10.1038/s41746-022-00705-7 - DOI - DOI - PMC - PubMed
1. ICD-10 Classification of Mental and Behavioural Disorders (The) Diagnostic Criteria for Research. Geneva: World Health Organization; 1993.
1. SNOMED CT (Systematized Nomenclature of Medicine -- Clinical Terms) Tech Target. [2022-12-05]. https://www.techtarget.com/searchhealthit/definition/SNOMED-CT .

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders

Affiliations

Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders

Authors

Affiliations

Abstract

Conflict of interest statement

Similar articles

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources