Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Aug 11;8(1):44.
doi: 10.1186/s40779-021-00338-z.

Data mining in clinical big data: the frequently used databases, steps, and methodological models

Affiliations
Review

Data mining in clinical big data: the frequently used databases, steps, and methodological models

Wen-Tao Wu et al. Mil Med Res. .

Abstract

Many high quality studies have emerged from public databases, such as Surveillance, Epidemiology, and End Results (SEER), National Health and Nutrition Examination Survey (NHANES), The Cancer Genome Atlas (TCGA), and Medical Information Mart for Intensive Care (MIMIC); however, these data are often characterized by a high degree of dimensional heterogeneity, timeliness, scarcity, irregularity, and other characteristics, resulting in the value of these data not being fully utilized. Data-mining technology has been a frontier field in medical research, as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models. Therefore, data mining has unique advantages in clinical big-data research, especially in large-scale medical public databases. This article introduced the main medical public database and described the steps, tasks, and models of data mining in simple language. Additionally, we described data-mining methods along with their practical applications. The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.

Keywords: Clinical big data; Data mining; MIMIC; Machine learning; Medical public database; NHANES; SEER; TCGA.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The steps of data mining in medical public database

References

    1. Herland M, Khoshgoftaar TM, Wald R. A review of data mining using big data in health informatics. J Big Data. 2014;1(1):1–35. doi: 10.1186/2196-1115-1-2. - DOI
    1. Wang F, Zhang P, Wang X, Hu J. Clinical risk prediction by exploring high-order feature correlations. AMIA Annu Symp Proc. 2014;2014:1170–1179. - PMC - PubMed
    1. Xu R, Li L, Wang Q. dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text. BMC Bioinform. 2014;15:105. doi: 10.1186/1471-2105-15-105. - DOI - PMC - PubMed
    1. Ramachandran S, Erraguntla M, Mayer R, Benjamin P, Editors. Data mining in military health systems-clinical and administrative applications. In: 2007 IEEE international conference on automation science and engineering; 2007. 10.1109/COASE.2007.4341764.
    1. Vie LL, Scheier LM, Lester PB, Ho TE, Labarthe DR, Seligman MEP. The US army person-event data environment: a military-civilian big data enterprise. Big Data. 2015;3(2):67–79. doi: 10.1089/big.2014.0055. - DOI - PubMed

Publication types

LinkOut - more resources