Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing

Thorvardur Jon Love¹, Tianxi Cai, Elizabeth W Karlson

Affiliations

PMID: 20701955
PMCID: PMC3691811
DOI: 10.1016/j.semarthrit.2010.05.002

Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing

Thorvardur Jon Love et al. Semin Arthritis Rheum. 2011 Apr.

. 2011 Apr;40(5):413-20.

doi: 10.1016/j.semarthrit.2010.05.002. Epub 2010 Aug 10.

Authors

Thorvardur Jon Love¹, Tianxi Cai, Elizabeth W Karlson

Affiliation

¹ Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA. tlove2@partners.org

PMID: 20701955
PMCID: PMC3691811
DOI: 10.1016/j.semarthrit.2010.05.002

Abstract

Objectives: To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data.

Methods: From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA.

Results: The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001).

Conclusions: Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research.

PubMed Disclaimer

Figures

**Figure 1**
The process of algorithm training, validation and use.

**Figure 2**
MDS plots showing distances between the 550 training cases reduced to one and two dimensions. The axes of the MDS plots represent no real sizes and are therefore marked as dimension 1 and 2 without a unit. Cases were separated into definite (crosses), possible (grey circles) and not (black circles) psoriatic arthritis on chart review. The algorithm was trained to separate the cases into psoriatic arthritis (crosses) or not (circles). Panel A shows the separation of cases in one dimension, illustrating how lack of separation between possible (grey circles) and definite (crosses) psoriatic arthritis may have limited the positive predictive value of the prediction rule. Panel B illustrates in two dimensions how there were at least two separate clusters of patients that were classified as not having PsA, suggesting that the algorithm used many methods to determine case status. Both panels illustrate well how the possible (grey circles) PsA cases failed to separate from definite (crosses) PSA cases.

**Figure 3**
ROC curves for the three algorithms trained The dotted line represents coded predictors (AUC = 0.9254), the dashed line is based on natural language processing of electronic notes (AUC = 0.9376), and the solid line is based on coded and natural language processing predictors combined (AUC = 0.9500). The straight line represents the specificity needed to achieve 90% positive predictive value at any given sensitivity. Panel B shows a magnification of the area in panel A indicated by the box. focusing on where the ROC curve for the combined algorithm and the 90% PPV line intersect, representing the optimal cut point of the prediction rule.

**Figure 4**
Importance of the individual predictors. The importance is measured as the mean decrease in accuracy after randomly permuting the predictor values, with a higher mean decrease in accuracy suggesting greater importance for the predictor. The importance measure is scaled by dividing the mean decrease in accuracy with its standard deviation (SD).

See this image and copyright information in PMC

References

1. Shbeeb M, Uramoto KM, Gibson LE, O’Fallon WM, Gabriel SE. The epidemiology of psoriatic arthritis in Olmsted County, Minnesota, USA, 1982–1991. The Journal of rheumatology. 2000 May;27(5):1247–50. - PubMed
1. Madland TM, Apalset EM, Johannessen AE, Rossebo B, Brun JG. Prevalence, disease manifestations, and treatment of psoriatic arthritis in Western Norway. The Journal of rheumatology. 2005 Oct;32(10):1918–22. - PubMed
1. Love TJ, Gudbjornsson B, Gudjonsson JE, Valdimarsson H. Psoriatic arthritis in Reykjavik, Iceland: prevalence, demographics, and disease course. The Journal of rheumatology. 2007 Oct;34(10):2082–8. - PubMed
1. Singh JA, Holmgren AR, Noorbaloochi S. Accuracy of Veterans Administration databases for a diagnosis of rheumatoid arthritis. Arthritis and rheumatism. 2004 Dec 15;51(6):952–7. - PubMed
1. Thomas SL, Edwards CJ, Smeeth L, Cooper C, Hall AJ. How accurate are diagnoses for rheumatoid arthritis and juvenile idiopathic arthritis in the general practice research database? Arthritis and rheumatism. 2008 Sep 15;59(9):1314–21. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing

Affiliation

Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous