Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 10:8:277-88.
doi: 10.2147/jpr.s8256. eCollection 2015.

Identification of a potential fibromyalgia diagnosis using random forest modeling applied to electronic medical records

Affiliations

Identification of a potential fibromyalgia diagnosis using random forest modeling applied to electronic medical records

Birol Emir et al. J Pain Res. .

Abstract

Background: Diagnosis of fibromyalgia (FM), a chronic musculoskeletal condition characterized by widespread pain and a constellation of symptoms, remains challenging and is often delayed.

Methods: Random forest modeling of electronic medical records was used to identify variables that may facilitate earlier FM identification and diagnosis. Subjects aged ≥18 years with two or more listings of the International Classification of Diseases, Ninth Revision, (ICD-9) code for FM (ICD-9 729.1) ≥30 days apart during the 2012 calendar year were defined as cases among subjects associated with an integrated delivery network and who had one or more health care provider encounter in the Humedica database in calendar years 2011 and 2012. Controls were without the FM ICD-9 codes. Seventy-two demographic, clinical, and health care resource utilization variables were entered into a random forest model with downsampling to account for cohort imbalances (<1% subjects had FM). Importance of the top ten variables was ranked based on normalization to 100% for the variable with the largest loss in predicting performance by its omission from the model. Since random forest is a complex prediction method, a set of simple rules was derived to help understand what factors drive individual predictions.

Results: The ten variables identified by the model were: number of visits where laboratory/non-imaging diagnostic tests were ordered; number of outpatient visits excluding office visits; age; number of office visits; number of opioid prescriptions; number of medications prescribed; number of pain medications excluding opioids; number of medications administered/ordered; number of emergency room visits; and number of musculoskeletal conditions. A receiver operating characteristic curve confirmed the model's predictive accuracy using an independent test set (area under the curve, 0.810). To enhance interpretability, nine rules were developed that could be used with good predictive probability of an FM diagnosis and to identify no-FM subjects.

Conclusion: Random forest modeling may help to quantify the predictive probability of an FM diagnosis. Rules can be developed to simplify interpretability. Further validation of these models may facilitate earlier diagnosis and enhance management.

Keywords: electronic medical records; fibromyalgia; health care resource utilization; predictive modeling; random forest; real-world data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The ten most important variables for predicting a diagnosis of fibromyalgia identified from random forest models. Notes: The level of importance, as shown on the x-axis, ranked for all identified variables based on normalization to 100% for the variable with the largest loss in predicting performance by its omission in the model. Abbreviation: ER, emergency room.
Figure 2
Figure 2
Receiver operating characteristic curve modeled using the test dataset. Notes: Receiver operating characteristic curve of the sensitivity and specificity for predicting the probability of a fibromyalgia diagnosis modeled using the test dataset from the ten most important variables identified from the random forest model. Point A, which denotes a probability value of 0.500, has a sensitivity of 0.641 and a specificity of 0.794. In contrast, point B shows the probability value, 0.446, that provides balance between sensitivity (0.721) and specificity (0.740).
Figure 3
Figure 3
Cumulative distribution functions for the variables identified in the random forest model. Notes: (A) Number of visits during which diagnostic/laboratory tests were ordered. (B) Number of outpatient visits (excluding office visits). (C) Age. (D) Number of office visits. (E) Number of opioid prescriptions. (F) Number of prescriptions written. (G) Number of pain medication prescriptions (excluding opioids). (H) Number of prescriptions administered (ordered). (I) Number of emergency department visits. (J) Number of musculoskeletal pain conditions.

References

    1. Wolfe F, Smythe HA, Yunus MB, et al. The American College of Rheumatology 1990 Criteria for the Classification of Fibromyalgia. Report of the Multicenter Criteria Committee. Arthritis Rheum. 1990;33(2):160–172. - PubMed
    1. Wolfe F, Clauw DJ, Fitzcharles MA, et al. The American College of Rheumatology preliminary diagnostic criteria for fibromyalgia and measurement of symptom severity. Arthritis Care Res. 2010;62(5):600–610. - PubMed
    1. Hoffman DL, Dukes E. The health status burden of people with fibromyalgia: a review of studies that assessed health status with the SF-36 or the SF-12. Int J Clin Pract. 2008;62(1):115–126. - PMC - PubMed
    1. Salaffi F, Sarzi-Puttini P, Girolimetti R, Atzeni F, Gasparini S, Grassi W. Health-related quality of life in fibromyalgia patients: a comparison with rheumatoid arthritis patients and the general population using the SF-36 health survey. Clin Exp Rheumatol. 2009;27(5 Suppl 56):S67–S74. - PubMed
    1. Wolfe F, Michaud K, Li T, Katz RS. EQ-5D and SF-36 quality of life measures in systemic lupus erythematosus: comparisons with rheumatoid arthritis, noninflammatory rheumatic disorders, and fibromyalgia. J Rheumatol. 2010;37(2):296–304. - PubMed