Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul:93:104657.
doi: 10.1016/j.ebiom.2023.104657. Epub 2023 Jun 21.

DNA methylation-based classifier differentiates intrahepatic pancreato-biliary tumours

Affiliations

DNA methylation-based classifier differentiates intrahepatic pancreato-biliary tumours

Mihnea P Dragomir et al. EBioMedicine. 2023 Jul.

Abstract

Background: Differentiating intrahepatic cholangiocarcinomas (iCCA) from hepatic metastases of pancreatic ductal adenocarcinoma (PAAD) is challenging. Both tumours have similar morphological and immunohistochemical pattern and share multiple driver mutations. We hypothesised that DNA methylation-based machine-learning algorithms may help perform this task.

Methods: We assembled genome-wide DNA methylation data for iCCA (n = 259), PAAD (n = 431), and normal bile duct (n = 70) from publicly available sources. We split this cohort into a reference (n = 399) and a validation set (n = 361). Using the reference cohort, we trained three machine learning models to differentiate between these entities. Furthermore, we validated the classifiers on the technical validation set and used an internal cohort (n = 72) to test our classifier.

Findings: On the validation cohort, the neural network, support vector machine, and the random forest classifiers reached accuracies of 97.68%, 95.62%, and 96.5%, respectively. Filtering by anomaly detection and thresholds improved the accuracy to 99.07% (37 samples excluded by filtering), 96.22% (17 samples excluded), and 100% (44 samples excluded) for the neural network, support vector machine and random forest, respectively. Because of best balance between accuracy and number of predictable cases we tested the neural network with applied filters on the in-house cohort, obtaining an accuracy of 95.45%.

Interpretation: We developed a classifier that can differentiate between iCCAs, intrahepatic metastases of a PAAD, and normal bile duct tissue with high accuracy. This tool can be used for improving the diagnosis of pancreato-biliary cancers of the liver.

Funding: This work was supported by Berlin Institute of Health (JCS Program), DKTK Berlin (Young Investigator Grant 2022), German Research Foundation (493697503 and 314905040 - SFB/TRR 209 Liver Cancer B01), and German Cancer Aid (70113922).

Keywords: Epigenetic; Machine learning; Molecular diagnosis; Oncology; Pathology.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests The authors declare no conflicts of interest.

Figures

Fig. 1
Fig. 1
The methylation landscape of pancreato-biliary tumours. (a) Graphical representation of the pancreato-biliary tumours and normal tissues represented in the t-SNE plot. (b) The two-dimensional representation of pancreato-biliary tumours analysed using the t-SNE method based on DNA methylation profiles. The colour of the samples represents the tumour type and their origin: iCCA—intrahepatic cholangiocarcinoma; iIPNB—intrahepatic intraductal papillary neoplasia of the bile duct; iITPN—intrahepatic intraductal tubulopapillary neoplasia of the bile duct; pCCA—perihilar cholangiocarcinoma; pIPNB—perihilar intraductal papillary neoplasia of the bile duct; pITPN—perihilar intraductal tubulopapillary neoplasia of the bile duct; eCCA—extrahepatic cholangiocarcinoma (samples that are pCCA/dCCA but no further details were available); dCCA—distal cholangiocarcinoma; dIPNB—distal intraductal papillary neoplasia of the bile duct; PAAD—pancreatic adenocarcinoma; ITPN-P–intraductal tubulopapillary neoplasia of the pancreas; normal bile; and normal pancreas. (c) The design of the methylation-based classifier that can distinguish between iCCA, PAAD liver metastases and normal bile duct samples. (d) Representative H&E images for an iCCA and a PAAD liver metastasis.
Fig. 2
Fig. 2
An in-depth biological analysis of the reference cohort. (a) A detailed overview of the samples composing the reference cohort including the study name, number of samples, material, array, and data type. (b) The two-dimensional representation of the reference cohort samples (n = 399) using the t-SNE method based on DNA methylation profiles. The colour of the samples represents their tissue of origin. (c-i) The same t-SNE in which the colour of the samples represents: (c) tumour purity, (d) study sets, (e) IDH1/2 status, (f) KRAS status, (g) TP53 status, (h) SMAD4 status, and (i) Fluke status.
Fig. 3
Fig. 3
Classification results of three machine learning models (random forest, support vector machine, and neural networks) on an independent validation cohort. (a) A detailed overview of the samples composing the validation cohort (n = 361) including the study name, number of samples, material, array, data type, and processing method. (b–d) Accuracy and predictable cases for different thresholds (0.5–0.95) for the random forest, support vector machine and neural networks classifiers. (e–g) Confusion matrices without filters (upper panel) and with filters (lower panel) for the three classifiers: random forest (e), support vector machine (f), and neural networks (g). (h–j) The probability score of the correct class for the classifiers (random forest (h), support vector machine (i), and neural network (j)) for three different variables: tissue of origin (iCCA, normal bile, and PAAD, left), material type (FFPE versus frozen, middle), and study set (right).
Fig. 4
Fig. 4
Testing the classifier on our in-house clinical samples. (a) The two-dimensional representation of the reference cohort samples (n = 399) using the t-SNE method based on DNA methylation profiles to which we added the samples from the clinical test cohort (n = 72). (b) Confusion matrices with no filters (upper panel) and with filters (lower panel) for the clinical test cohort. (c) A heatmap overview of the neural networks, support vector machine, and random forest results corroborated with clinical, pathological and molecular data.

References

    1. Bledsoe J.R., Shinagare S.A., Deshpande V. Difficult diagnostic problems in pancreatobiliary neoplasia. Arch Pathol Lab Med. 2015;139(7):848–857. - PubMed
    1. Lowery M.A., Ptashkin R., Jordan E., et al. Comprehensive molecular profiling of intrahepatic and extrahepatic cholangiocarcinomas: potential targets for intervention. Clin Cancer Res. 2018;24(17):4154–4161. - PMC - PubMed
    1. Cancer Genome Atlas Research Network. Electronic address aadhe, cancer genome atlas research N Integrated genomic characterization of pancreatic ductal adenocarcinoma. Cancer Cell. 2017;32(2):185–203.e13. - PMC - PubMed
    1. Farshidfar F., Zheng S., Gingras M.C., et al. Integrative genomic analysis of cholangiocarcinoma identifies distinct IDH-mutant molecular profiles. Cell Rep. 2017;18(11):2780–2794. - PMC - PubMed
    1. Jiao Y., Pawlik T.M., Anders R.A., et al. Exome sequencing identifies frequent inactivating mutations in BAP1, ARID1A and PBRM1 in intrahepatic cholangiocarcinomas. Nat Genet. 2013;45(12):1470–1473. - PMC - PubMed