. 2018 Jun 25;18(1):47.

doi: 10.1186/s12911-018-0623-9.

CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital

Richard Jackson^{1

2}, Ismail Kartoglu³, Clive Stringer⁴, Genevieve Gorrell⁵, Angus Roberts⁵, Xingyi Song⁵, Honghan Wu^{6

7}, Asha Agrawal⁴, Kenneth Lui⁸, Tudor Groza⁹, Damian Lewsley⁴, Doug Northwood⁴, Amos Folarin^{6

8}, Robert Stewart^{6

10}, Richard Dobson^{6

8}

Affiliations

¹ Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigne Park, London, SE5 8AF, UK. richgjackson@gmail.com.
² South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, UK. richgjackson@gmail.com.
³ InterDigital Communications, 64 Great Eastern Street, 1st Floor, London, EC2A 3QR, UK.
⁴ King's College Hospital, Denmark Hill, London, SE5 9RS, UK.
⁵ University of Sheffield, Western Bank, Sheffield, S10 2TN, UK.
⁶ Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigne Park, London, SE5 8AF, UK.
⁷ Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, EH16 4UX, UK.
⁸ Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT, UK.
⁹ Garvan Institute of Medical Research, Sydney, NSW 2010, Australia.
¹⁰ South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, UK.

PMID: 29941004
PMCID: PMC6020175
DOI: 10.1186/s12911-018-0623-9

CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital

Richard Jackson et al. BMC Med Inform Decis Mak. 2018.

. 2018 Jun 25;18(1):47.

doi: 10.1186/s12911-018-0623-9.

Authors

Affiliations

¹ Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigne Park, London, SE5 8AF, UK. richgjackson@gmail.com.
² South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, UK. richgjackson@gmail.com.
³ InterDigital Communications, 64 Great Eastern Street, 1st Floor, London, EC2A 3QR, UK.
⁴ King's College Hospital, Denmark Hill, London, SE5 9RS, UK.
⁵ University of Sheffield, Western Bank, Sheffield, S10 2TN, UK.
⁶ Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigne Park, London, SE5 8AF, UK.
⁷ Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, EH16 4UX, UK.
⁸ Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT, UK.
⁹ Garvan Institute of Medical Research, Sydney, NSW 2010, Australia.
¹⁰ South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ, UK.

PMID: 29941004
PMCID: PMC6020175
DOI: 10.1186/s12911-018-0623-9

Abstract

Background: Traditional health information systems are generally devised to support clinical data collection at the point of care. However, as the significance of the modern information economy expands in scope and permeates the healthcare domain, there is an increasing urgency for healthcare organisations to offer information systems that address the expectations of clinicians, researchers and the business intelligence community alike. Amongst other emergent requirements, the principal unmet need might be defined as the 3R principle (right data, right place, right time) to address deficiencies in organisational data flow while retaining the strict information governance policies that apply within the UK National Health Service (NHS). Here, we describe our work on creating and deploying a low cost structured and unstructured information retrieval and extraction architecture within King's College Hospital, the management of governance concerns and the associated use cases and cost saving opportunities that such components present.

Results: To date, our CogStack architecture has processed over 300 million lines of clinical data, making it available for internal service improvement projects at King's College London. On generated data designed to simulate real world clinical text, our de-identification algorithm achieved up to 94% precision and up to 96% recall.

Conclusion: We describe a toolkit which we feel is of huge value to the UK (and beyond) healthcare community. It is the only open source, easily deployable solution designed for the UK healthcare environment, in a landscape populated by expensive proprietary systems. Solutions such as these provide a crucial foundation for the genomic revolution in medicine.

Keywords: Clinical informatics; Elasticsearch; Electronic health records; Information extraction; Natural language processing.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The creation of the CogStack software was an internal service development project for King’s College Hospital NHS Foundation Trust, and thus did not require ethical approval. As no patient identifiable data was required for the development of the software, no approval was sought from the Health Research Authority according to Confidentiality Advisory Group guidelines (http://www.hra.nhs.uk/resources/confidentiality-advisory-group/determining-need-cag-application/). The validation of the Bio-YODIE software made use of the CRIS dataset, which is approved as an anonymised data resource for secondary analysis by Oxfordshire Research Ethics Committee C (08/H0606/71) and governance is provided for all projects and dissemination through a patient-led oversight committee.

Consent for publication

Not applicable: No individual persons data is presented in this manuscript.

Competing interests

RJ and RS have received research funding from Roche, Pfizer, J&J and Lundbeck.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
CogStack Architecture and Dataflow All components can be deployed via the Docker containerisation software. **1. New job execution** Master instance of CogStack identifies new data in Trust Data Sources at intermittent intervals. **2. Partitioning** The job is partitioned into a user definable number of work units. **3a. Derive the freetext content** Extract plain and/or formatted text from common proprietary document binary formats (performing OCR where necessary), using the Tika Library to enable the downstream processing of high value unstructured data elements. **3b. Supplement the text content with meta-data** Filter and de-normalise a subset of the structured clinical data to provide a patient orientated, transparent representation of high value metadata concepts. For example, this might include calculated fields to represent patient age at document date, first part of postcode and ethnicity and lab results. **3c. De-identification** Transform the resulting text documents into de-identified text documents, by masking personal health identifiers via the use of the Cognition de-identification algorithms. This is necessary to address governance concerns associated with the secondary use of patient data. Identifiers in structured data can be excluded via SQL query, according to business requirements. **4. Information Extraction** Apply generic clinical IE pipelines to derive additional structured data from free text and supplement the quantity of available structured data at the point of query. **5. Indexing** Build a JSON object from the resulting structured and unstructured data, which can then be readily be indexed into an Elasticsearch cluster. **6. Visualisation** The Kibana suite provides a range of attractive options for viewing, aggregating and dash-boarding the loaded data

**Fig. 2**
Kibana interface loaded with pseudo-data

See this image and copyright information in PMC

References

1. Simborg DW. An emerging standard for health communications: The HL7 standard. Healthc Comput Commun. 1987;4(10):58–60. - PubMed
1. Klein GO. Standardization of health informatics–results and challenges. Methods Inf Med. 2002;41(4):261–70. doi: 10.1055/s-0038-1634486. - DOI - PubMed
1. Barnes M. Lessons learned from the implementation of clinical messaging systems. AMIA... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium. Montgomery: The American Medical Informatics Institution; 2007. - PMC - PubMed
1. Worden R, Scott P. Simplifying HL7 Version 3 messages. Stud Health Technol Inform. 2011;169:709–13. - PubMed
1. Antolík J. Automatic annotation of medical records. Stud Health Technol Inform. 2005;116:817–22. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- ClinicalTrials.gov
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital

Affiliations

CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical