Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Settings: Natural Language Processing Approach
- PMID: 38875566
- PMCID: PMC11041417
- DOI: 10.2196/51240
Identifying Symptoms Prior to Pancreatic Ductal Adenocarcinoma Diagnosis in Real-World Care Settings: Natural Language Processing Approach
Abstract
Background: Pancreatic cancer is the third leading cause of cancer deaths in the United States. Pancreatic ductal adenocarcinoma (PDAC) is the most common form of pancreatic cancer, accounting for up to 90% of all cases. Patient-reported symptoms are often the triggers of cancer diagnosis and therefore, understanding the PDAC-associated symptoms and the timing of symptom onset could facilitate early detection of PDAC.
Objective: This paper aims to develop a natural language processing (NLP) algorithm to capture symptoms associated with PDAC from clinical notes within a large integrated health care system.
Methods: We used unstructured data within 2 years prior to PDAC diagnosis between 2010 and 2019 and among matched patients without PDAC to identify 17 PDAC-related symptoms. Related terms and phrases were first compiled from publicly available resources and then recursively reviewed and enriched with input from clinicians and chart review. A computerized NLP algorithm was iteratively developed and fine-trained via multiple rounds of chart review followed by adjudication. Finally, the developed algorithm was applied to the validation data set to assess performance and to the study implementation notes.
Results: A total of 408,147 and 709,789 notes were retrieved from 2611 patients with PDAC and 10,085 matched patients without PDAC, respectively. In descending order, the symptom distribution of the study implementation notes ranged from 4.98% for abdominal or epigastric pain to 0.05% for upper extremity deep vein thrombosis in the PDAC group, and from 1.75% for back pain to 0.01% for pale stool in the non-PDAC group. Validation of the NLP algorithm against adjudicated chart review results of 1000 notes showed that precision ranged from 98.9% (jaundice) to 84% (upper extremity deep vein thrombosis), recall ranged from 98.1% (weight loss) to 82.8% (epigastric bloating), and F1-scores ranged from 0.97 (jaundice) to 0.86 (depression).
Conclusions: The developed and validated NLP algorithm could be used for the early detection of PDAC.
Keywords: abdominal pain; cancer; cancer death; clinical note; computerized algorithm; detection; electronic health record; natural language processing; pain; pancreas; pancreatic cancer; pancreatic ductal adenocarcinoma; symptom; validation.
©Fagen Xie, Jenny Chang, Tiffany Luong, Bechien Wu, Eva Lustigova, Eva Shrader, Wansu Chen. Originally published in JMIR AI (https://ai.jmir.org), 15.01.2024.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures
Similar articles
-
Natural Language Processing for Improved Characterization of COVID-19 Symptoms: Observational Study of 350,000 Patients in a Large Integrated Health Care System.JMIR Public Health Surveill. 2022 Dec 30;8(12):e41529. doi: 10.2196/41529. JMIR Public Health Surveill. 2022. PMID: 36446133 Free PMC article.
-
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014. JMIR Form Res. 2023. PMID: 36881467 Free PMC article.
-
Identifying Cases of Shoulder Injury Related to Vaccine Administration (SIRVA) in the United States: Development and Validation of a Natural Language Processing Method.JMIR Public Health Surveill. 2022 May 24;8(5):e30426. doi: 10.2196/30426. JMIR Public Health Surveill. 2022. PMID: 35608886 Free PMC article.
-
Advances in Early Detection of Pancreatic Cancer.Diagnostics (Basel). 2019 Feb 5;9(1):18. doi: 10.3390/diagnostics9010018. Diagnostics (Basel). 2019. PMID: 30764550 Free PMC article. Review.
-
Hyperpolarized Magnetic Resonance and Artificial Intelligence: Frontiers of Imaging in Pancreatic Cancer.JMIR Med Inform. 2021 Jun 17;9(6):e26601. doi: 10.2196/26601. JMIR Med Inform. 2021. PMID: 34137725 Free PMC article. Review.
Cited by
-
Identifying Asthma-Related Symptoms From Electronic Health Records Using a Hybrid Natural Language Processing Approach Within a Large Integrated Health Care System: Retrospective Study.JMIR AI. 2025 May 2;4:e69132. doi: 10.2196/69132. JMIR AI. 2025. PMID: 40611521 Free PMC article.
-
Symptoms of Asthma Extracted Through Natural Language Processing and Their Associations With Acute Asthma Exacerbation in Adults With Mild Asthma.J Allergy Clin Immunol Pract. 2025 Jul;13(7):1719-1729.e7. doi: 10.1016/j.jaip.2025.04.031. Epub 2025 Apr 26. J Allergy Clin Immunol Pract. 2025. PMID: 40294848
References
-
- American Cancer Society. [2023-07-23]. https://www.cancer.org/content/dam/cancer-org/research/cancer-facts-and-... .
-
- Cancer stat facts: pancreatic cancer. Surveillance, Epidemiology, and End Results. [2023-07-24]. https://seer.cancer.gov/statfacts/html/pancreas.html .
-
- Zhang L, Sanagapalli S, Stoita A. Challenges in diagnosis of pancreatic cancer. World J Gastroenterol. 2018 May 21;24(19):2047–2060. doi: 10.3748/wjg.v24.i19.2047. https://www.wjgnet.com/1007-9327/full/v24/i19/2047.htm - DOI - PMC - PubMed
-
- Risch HA, Yu H, Lu L, Kidd MS. Detectable symptomatology preceding the diagnosis of pancreatic cancer and absolute risk of pancreatic cancer diagnosis. Am J Epidemiol. 2015 Jul 01;182(1):26–34. doi: 10.1093/aje/kwv026. https://europepmc.org/abstract/MED/26049860 kwv026 - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources