Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun;29(1):e100535.
doi: 10.1136/bmjhci-2021-100535.

Establishing a colorectal cancer research database from routinely collected health data: the process and potential from a pilot study

Affiliations

Establishing a colorectal cancer research database from routinely collected health data: the process and potential from a pilot study

Andres Tamm et al. BMJ Health Care Inform. 2022 Jun.

Abstract

Objective: Colorectal cancer is a common cause of death and morbidity. A significant amount of data are routinely collected during patient treatment, but they are not generally available for research. The National Institute for Health Research Health Informatics Collaborative in the UK is developing infrastructure to enable routinely collected data to be used for collaborative, cross-centre research. This paper presents an overview of the process for collating colorectal cancer data and explores the potential of using this data source.

Methods: Clinical data were collected from three pilot Trusts, standardised and collated. Not all data were collected in a readily extractable format for research. Natural language processing (NLP) was used to extract relevant information from pseudonymised imaging and histopathology reports. Combining data from many sources allowed reconstruction of longitudinal histories for each patient that could be presented graphically.

Results: Three pilot Trusts submitted data, covering 12 903 patients with a diagnosis of colorectal cancer since 2012, with NLP implemented for 4150 patients. Timelines showing individual patient longitudinal history can be grouped into common treatment patterns, visually presenting clusters and outliers for analysis. Difficulties and gaps in data sources have been identified and addressed.

Discussion: Algorithms for analysing routinely collected data from a wide range of sites and sources have been developed and refined to provide a rich data set that will be used to better understand the natural history, treatment variation and optimal management of colorectal cancer.

Conclusion: The data set has great potential to facilitate research into colorectal cancer.

Keywords: Database Management Systems; Electronic Health Records; Health Information Systems; Hospital Records; Informatics.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

Figure 1
Figure 1
Hypothetical patient timelines that show specific treatment and surveillance patterns. Group A: Timelines of patients with colon cancer that follow the pattern ‘diagnosis, scan, surgery, scan’. Group B: Patients with rectal cancer with ‘diagnosis, scan, chemoradiotherapy, radical resection, chemo(radio)therapy, scan’. Group C: Patiens with colorectal cancer with ‘diagnosis, treatment, scan, recurrence, treatment, death’. Group D: Patients with rectal cancer with local excision. Timelines for 10 patients were created to illustrate each group. TNM, tumour, node, metastases.
Figure 2
Figure 2
Presurgery and postsurgery T staging for patients with colon cancer (C18) who had a major resection, determined by natural language processing (NLP) of imaging reports (presurgery) and histopathology reports (postsurgery). Number of patients is given in brackets.
Figure 3
Figure 3
Longitudinal pathway plot of a hypothetical patient with rectal cancer treated with neoadjuvant therapy then radical resection. After a colonoscopy and around the time of diagnosis the patient had neoadjuvant radiotherapy and chemotherapy as identified by the green and blue circles. They then proceeded to surgery, after which TNM staging was available (small pink circles). The next time point for this patient (light grey line) shows a scan done as part of the follow-up regime, with several further thereafter. Nearly 300 days since diagnosis a scan and colonoscopy led to the diagnosis of recurrence and further radiotherapy and chemotherapy. The final ‘X’ signifies death, although it does not show whether death was related to the cancer or not. TNM, tumour, node, metastases.
Figure 4
Figure 4
Longitudinal pathway plot of a hypothetical patient with rectal cancer who underwent local excision. Rectal cancer was picked up on colonoscopy as indicated by the dark grey line, and treated by local excision as indicated by the orange circle. After a disease-free surveillance period of approximately 18 months, the patient had recurrence as shown by the first red arrow. This was followed by radiotherapy and chemotherapy prior to death. TNM, tumour, node, metastases.

References

    1. Global Cancer Observatory . Global Cancer Observatory Colorectal Factsheet, 2020. Available: https://gco.iarc.fr/today/data/factsheets/cancers/10_8_9-Colorectum-fact... [Accessed Sep 2021].
    1. Cancer Research UK . Bowel cancer incidence statistics. Available: https://www.cancerresearchuk.org/health-professional/cancer-statistics/s... [Accessed Sep 2021].
    1. World Health Organization . Global Health Observatory. Geneva: World Health Organization, 2020. Available: https://www.who.int/data/gho/
    1. Institute for Health Metrics and Evaluation (IHME) . GBD. Seattle, WA: IHME, University of Washington. Available: http://www.healthdata.org [Accessed Jan 2020].
    1. NHS Digital . National bowel cancer audit. Available: https://digital.nhs.uk/data-and-information/clinical-audits-and-registri... [Accessed Aug 2021].

LinkOut - more resources