Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May 25;7(1):2427.
doi: 10.1038/s41598-017-02606-2.

Classification of Paediatric Inflammatory Bowel Disease using Machine Learning

Affiliations

Classification of Paediatric Inflammatory Bowel Disease using Machine Learning

E Mossotto et al. Sci Rep. .

Abstract

Paediatric inflammatory bowel disease (PIBD), comprising Crohn's disease (CD), ulcerative colitis (UC) and inflammatory bowel disease unclassified (IBDU) is a complex and multifactorial condition with increasing incidence. An accurate diagnosis of PIBD is necessary for a prompt and effective treatment. This study utilises machine learning (ML) to classify disease using endoscopic and histological data for 287 children diagnosed with PIBD. Data were used to develop, train, test and validate a ML model to classify disease subtype. Unsupervised models revealed overlap of CD/UC with broad clustering but no clear subtype delineation, whereas hierarchical clustering identified four novel subgroups characterised by differing colonic involvement. Three supervised ML models were developed utilising endoscopic data only, histological only and combined endoscopic/histological data yielding classification accuracy of 71.0%, 76.9% and 82.7% respectively. The optimal combined model was tested on a statistically independent cohort of 48 PIBD patients from the same clinic, accurately classifying 83.3% of patients. This study employs mathematical modelling of endoscopic and histological data to aid diagnostic accuracy. While unsupervised modelling categorises patients into four subgroups, supervised approaches confirm the need of both endoscopic and histological evidence for an accurate diagnosis. Overall, this paper provides a blueprint for ML use with clinical data.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
Model and data processing. Schematic representation of the model construction (blue section), validation (green section) and IBDU reclassification (red section) phases. Solid arrows represent data stream while dashed arrows represent parameters or metrics stream. The discovery set was used to identify the optimal penalty parameter (C) and number of features using the recursive feature elimination with cross validation algorithm (RFE-CV). These two elements were then passed to the training and testing set which was then modelled using a support vector machine (SVM). Three metrics were collected: area under the ROC curve (AUC); accuracy over the 5 folds and; a permutation-generated p-value.
Figure 2
Figure 2
Dimensionality reduction approaches and hierarchical clustering of PIBD data. (A,B) Principal component analysis (A) and multidimensional scaling (B) of clinical data from 239 PIBD patients. The first three PCA components account for 52.2% of the total variance. Important note – UC/CD/IBDU diagnoses were used only to retrospectively colour data points and were not included in actual modelling. (C) Heatmap of endoscopic and histological tissue abnormalities in PIBD patients. Abnormal manifestations are shown in orange, normal in light blue and missing data in white. Asterisks indicate histology features. Ascending colon, transverse colon and descending colon labels were shortened to A-Colon, T-Colon and D-Colon respectively. Left hand side bar shows the referred diagnosis: CD in red, UC in blue, IBDU in yellow. Again, UC/CD/IBDU diagnoses were not used to model data but only to retrospectively colour each element. The top bar shows the type of investigation: histology in white, endoscopy in black. Identified colorectal groups are shown by dashed boxes and labelled from one (i) to four (iv). (D) Box and whisker plot depicting C-reactive protein (CRP) levels recorded at diagnosis across the four identified groups. Each box represents data from the first (bottom edge) and the third (top edge) quartile. Red bars and numbers are the median CRP level. Dashed whiskers show the lowest and highest CRP within each group. Black circles are outlier data points.
Figure 3
Figure 3
Supervised classification performance and metrics. (A) Receiver operating characteristic of the combined (light blue), histology (purple) and endoscopy (green) models. The grey dashed line represents the expected performance of a random model. (B) Permutation tests of models: dashed lines represent the observed accuracy of the combined (light blue), histology (purple) and endoscopy (green) models. The endoscopic, histological and combined models have a p-value of p = 3 × 10−3, p = 5 × 10−6 and p = 1 × 10−6 respectively. The grey dashed line represents the average expected performance of random model. Solid coloured lines show the distribution of random permutations for each model. (C) Classification of IBDU patients with the combined model in Crohn’s disease (red) or ulcerative colitis (blue) subtypes. The classification posterior probability indicates the confidence of the model in assigning UC or CD labels. (D) Cumulative confidence in IBDU reclassification represented as cumulative density function (red line) of posterior probabilities for 29 IBDU patients. Each dot represents an IBDU patient.

Similar articles

Cited by

References

    1. Henderson P, et al. Rising incidence of pediatric inflammatory bowel disease in Scotland. Inflamm. Bowel Dis. 2012;18:999–1005. doi: 10.1002/ibd.21797. - DOI - PubMed
    1. Ashton JJ, et al. Rising incidence of paediatric inflammatory bowel disease (PIBD) in Wessex, Southern England. Arch. Dis. Child. 2014;99:659–664. doi: 10.1136/archdischild-2013-305419. - DOI - PubMed
    1. Podolsky DK. Inflammatory Bowel Disease. N. E. J. Med. 1991;325:928–937. doi: 10.1056/NEJM199109263251306. - DOI - PubMed
    1. Fernandes MA, Verstraete SG, Garnett EA, Heyman MB. Addition of Histology to the Paris Classification of Pediatric Crohn Disease Alters Classification of Disease Location. J. Pediatr. Gastroenterol. Nutr. 2016;62:242–245. doi: 10.1097/MPG.0000000000000967. - DOI - PMC - PubMed
    1. Ashton JJ, et al. Endoscopic Versus Histological Disease Extent at Presentation of Paediatric Inflammatory Bowel Disease. J. Pediatr. Gastroenterol. Nutr. 2016;62:246–251. doi: 10.1097/MPG.0000000000001032. - DOI - PubMed

Publication types

LinkOut - more resources