Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 11;9(5):e24205.
doi: 10.2196/24205.

Transforming a Patient Registry Into a Customized Data Set for the Advanced Statistical Analysis of Health Risk Factors and for Medication-Related Hospitalization Research: Retrospective Hospital Patient Registry Study

Affiliations

Transforming a Patient Registry Into a Customized Data Set for the Advanced Statistical Analysis of Health Risk Factors and for Medication-Related Hospitalization Research: Retrospective Hospital Patient Registry Study

Zhivko Taushanov et al. JMIR Med Inform. .

Abstract

Background: Hospital patient registries provide substantial longitudinal data sets describing the clinical and medical health statuses of inpatients and their pharmacological prescriptions. Despite the multiple advantages of routinely collecting multidimensional longitudinal data, those data sets are rarely suitable for advanced statistical analysis and they require customization and synthesis.

Objective: The aim of this study was to describe the methods used to transform and synthesize a raw, multidimensional, hospital patient registry data set into an exploitable database for the further investigation of risk profiles and predictive and survival health outcomes among polymorbid, polymedicated, older inpatients in relation to their medicine prescriptions at hospital discharge.

Methods: A raw, multidimensional data set from a public hospital was extracted from the hospital registry in a CSV (.csv) file and imported into the R statistical package for cleaning, customization, and synthesis. Patients fulfilling the criteria for inclusion were home-dwelling, polymedicated, older adults with multiple chronic conditions aged ≥65 who became hospitalized. The patient data set covered 140 variables from 20,422 hospitalizations of polymedicated, home-dwelling older adults from 2015 to 2018. Each variable, according to type, was explored and computed to describe distributions, missing values, and associations. Different clustering methods, expert opinion, recoding, and missing-value techniques were used to customize and synthesize these multidimensional data sets.

Results: Sociodemographic data showed no missing values. Average age, hospital length of stay, and frequency of hospitalization were computed. Discharge details were recoded and summarized. Clinical data were cleaned up and best practices for managing missing values were applied. Seven clusters of medical diagnoses, surgical interventions, somatic, cognitive, and medicines data were extracted using empirical and statistical best practices, with each presenting the health status of the patients included in it as accurately as possible. Medical, comorbidity, and drug data were recoded and summarized.

Conclusions: A cleaner, better-structured data set was obtained, combining empirical and best-practice statistical approaches. The overall strategy delivered an exploitable, population-based database suitable for an advanced analysis of the descriptive, predictive, and survival statistics relating to polymedicated, home-dwelling older adults admitted as inpatients. More research is needed to develop best practices for customizing and synthesizing large, multidimensional, population-based registries.

International registered report identifier (irrid): RR2-10.1136/bmjopen-2019-030030.

Keywords: cluster analysis; hierarchical 2-step clustering; hospital; multidimensional; population based; raw data; registry; retrospective.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Structure and content of the data set clusters.
Figure 2
Figure 2
Dendrogram of cognitive status variables.
Figure 3
Figure 3
Silhouette statistics for choosing the optimal number of clusters: the two- or four-cluster solutions were suggested.
Figure 4
Figure 4
Dendrogram of the somatic health status variables.
Figure 5
Figure 5
Average silhouette width for each number of sub-clusters in the mobility sub-cluster.
Figure 6
Figure 6
Health impairments sub-cluster: silhouette statistics for choosing the number of groupings suggested the four-cluster grouping solution.
Figure 7
Figure 7
Silhouette statistics for the sub-cluster of capacities for the activities of daily living.

Similar articles

Cited by

References

    1. Gliklich R, Dreyer N, Leavy M. Registries for Evaluating Patient Outcomes: Patient Registries (3rd ed) Rockville, MD: Agency for Healthcare Research and Quality; 2014. - PubMed
    1. Strasberg H, Tudiver F, Holbrook A M, Geiger G, Keshavjee K K, Troyan S. Moving towards an electronic patient record: a survey to assess the needs of community family physicians. Proc AMIA Symp. 1998:230–234. - PMC - PubMed
    1. Brooke E. The current and future use of registers in health information systems. World Health Organization: Geneva. 1974:43. https://apps.who.int/iris/handle/10665/36936
    1. Walsh K, Marsolo Keith A, Davis Cori, Todd Theresa, Martineau Bernadette, Arbaugh Carlie, Verly Frederique, Samson Charles, Margolis Peter. Accuracy of the medication list in the electronic health record-implications for care, research, and improvement. J Am Med Inform Assoc. 2018 Jul 01;25(7):909–912. doi: 10.1093/jamia/ocy027. http://europepmc.org/abstract/MED/29771350 - DOI - PMC - PubMed
    1. Chipps E, Tucker S, Labardee R, Thomas B, Weber M, Gallagher-Ford Lynn, Melnyk BM. The Impact of the Electronic Health Record on Moving New Evidence-Based Nursing Practices Forward. Worldviews Evid Based Nurs. 2020 Apr;17(2):136–143. doi: 10.1111/wvn.12435. - DOI - PubMed

LinkOut - more resources