Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar;4(5):91.
doi: 10.21037/atm.2016.02.11.

Univariate description and bivariate statistical inference: the first step delving into data

Affiliations

Univariate description and bivariate statistical inference: the first step delving into data

Zhongheng Zhang. Ann Transl Med. 2016 Mar.

Abstract

In observational studies, the first step is usually to explore data distribution and the baseline differences between groups. Data description includes their central tendency (e.g., mean, median, and mode) and dispersion (e.g., standard deviation, range, interquartile range). There are varieties of bivariate statistical inference methods such as Student's t-test, Mann-Whitney U test and Chi-square test, for normal, skews and categorical data, respectively. The article shows how to perform these analyses with R codes. Furthermore, I believe that the automation of the whole workflow is of paramount importance in that (I) it allows for others to repeat your results; (II) you can easily find out how you performed analysis during revision; (III) it spares data input by hand and is less error-prone; and (IV) when you correct your original dataset, the final result can be automatically corrected by executing the codes. Therefore, the process of making a publication quality table incorporating all abovementioned statistics and P values is provided, allowing readers to customize these codes to their own needs.

Keywords: R; Univariate description; automation; baseline characteristics; bivariate statistical inference; table.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: The author has no conflicts of interest to declare.

Figures

None
Zhongheng Zhang, MMed.
Figure 1
Figure 1
Histograms of variables of age and wbc. It appears that the distribution of age was symmetrical, while the variable wbc is skewed.
Figure 2
Figure 2
Settings in Microsoft Word to convert text to table. Note that the double quote mark is used to separate text.

References

    1. Zhang Z. Missing values in big data research: some basic skills. Ann Transl Med 2015;3:323. - PMC - PubMed
    1. Zhang Z. Data management by using R: big data clinical research series. Ann Transl Med 2015;3:303. - PMC - PubMed
    1. Fay MP, Proschan MA. Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Stat Surv 2010;4:1-39. 10.1214/09-SS051 - DOI - PMC - PubMed
    1. Corder GW, Foreman DI. Nonparametric statistics: A step-by-step approach. New York: Wiley, 2014.
    1. Komsta L, Novomestky F. moments: moments, cumulants, skewness, kurtosis and related tests. 2012. R package version 0.13; 2014.

LinkOut - more resources