Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan:168:107757.
doi: 10.1016/j.compbiomed.2023.107757. Epub 2023 Nov 25.

Using an artificial intelligence tool incorporating natural language processing to identify patients with a diagnosis of ANCA-associated vasculitis in electronic health records

Affiliations
Free article

Using an artificial intelligence tool incorporating natural language processing to identify patients with a diagnosis of ANCA-associated vasculitis in electronic health records

Jolijn R van Leeuwen et al. Comput Biol Med. 2024 Jan.
Free article

Abstract

Background: Because anti-neutrophil cytoplasmatic antibody (ANCA)-associated vasculitis (AAV) is a rare, life-threatening, auto-immune disease, conducting research is difficult but essential. A long-lasting challenge is to identify rare AAV patients within the electronic-health-record (EHR)-system to facilitate real-world research. Artificial intelligence (AI)-search tools using natural language processing (NLP) for text-mining are increasingly postulated as a solution.

Methods: We employed an AI-tool that combined text-mining with NLP-based exclusion, to accurately identify rare AAV patients within large EHR-systems (>2.000.000 records). We developed an identification method in an academic center with an established AAV-training set (n = 203) and validated the method in a non-academic center with an AAV-validation set (n = 84). To assess accuracy anonymized patient records were manually reviewed.

Results: Based on an iterative process, a text-mining search was developed on disease description, laboratory measurements, medication and specialisms. In the training center, 608 patients were identified with a sensitivity of 97.0 % (95%CI [93.7, 98.9]) and positive predictive value (PPV) of 56.9 % (95%CI [52.9, 60.1]). NLP-based exclusion resulted in 444 patients increasing PPV to 77.9 % (95%CI [73.7, 81.7]) while sensitivity remained 96.3 % (95%CI [93.8, 98.0]). In the validation center, text-mining identified 333 patients (sensitivity 97.6 % (95%CI [91.6, 99.7]), PPV 58.2 % (95%CI [52.8, 63.6])) and NLP-based exclusion resulted in 223 patients, increasing PPV to 86.1 % (95%CI [80.9, 90.4]) with 98.0 % (95%CI [94.9, 99.4]) sensitivity. Our identification method outperformed ICD-10-coding predominantly in identifying MPO+ and organ-limited AAV patients.

Conclusions: Our study highlights the advantages of implementing AI, notably NLP, to accurately identify rare AAV patients within large EHR-systems and demonstrates the applicability and transportability. Therefore, this method can reduce efforts to identify AAV patients and accelerate real-world research, while avoiding bias by ICD-10-coding.

Keywords: ANCA-Associated vasculitis; Artificial intelligence; Electronic-health-records; Natural language processing; Pauci-immune glomerulonephritis.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The work of YKOT is supported by the Dutch Kidney Foundation (17OKG04) and by the Arthritis Research and Collaboration Hub (ARCH) foundation. ARCH is funded by Dutch Arthritis Foundation (ReumaNederland). YKOT received an unrestricted research grant from GlaxoSmithKline, Aurinia Pharmaceuticals and Vifor Pharma. The LUMC received consulting fees from Aurinia Pharmaceuticals, Novartis, GSK, KezarBio, Vifor Pharma, Otsuka Pharmaceuticals on consultancies delivered by YKOT.

Publication types

MeSH terms

LinkOut - more resources