Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2017 Aug;70(4):407-411.
doi: 10.4097/kjae.2017.70.4.407. Epub 2017 Jul 27.

Statistical data preparation: management of missing values and outliers

Affiliations
Review

Statistical data preparation: management of missing values and outliers

Sang Kyu Kwak et al. Korean J Anesthesiol. 2017 Aug.

Abstract

Missing values and outliers are frequently encountered while collecting data. The presence of missing values reduces the data available to be analyzed, compromising the statistical power of the study, and eventually the reliability of its results. In addition, it causes a significant bias in the results and degrades the efficiency of the data. Outliers significantly affect the process of estimating statistics (e.g., the average and standard deviation of a sample), resulting in overestimated or underestimated values. Therefore, the results of data analysis are considerably dependent on the ways in which the missing values and outliers are processed. In this regard, this review discusses the types of missing values, ways of identifying outliers, and dealing with the two.

Keywords: Bias; Data collection; Data interpretation; Statistics.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Boxplot with outliers. The upper and lower fences represent values more and less than 75th and 25th percentiles (3rd and 1st quartiles), respectively, by 1.5 times the difference between the 3rd and 1st quartiles. An outlier is defined as the value above or below the upper or lower fences.

References

    1. Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592.
    1. Rubin DB. Multiple imputation after 18+ years. J Am Stat Assoc. 1996;91:473–489.
    1. Schafer JL. Multiple imputation: a primer. Stat Methods Med Res. 1999;8:3–15. - PubMed
    1. Gentleman J, Wilk M. Detecting outliers II: supplementing the direct analysis of residuals. Biometrics. 1975;31:387–410.
    1. Seo HS, Yoon M. Outlier detection using support vector machines. Commun Stat Appl Methods. 2011;18:171–177.