Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Summer;40(2):64-73.
doi: 10.1111/test.12156. Epub 2018 Apr 11.

Randomization-Based Statistical Inference: A Resampling and Simulation Infrastructure

Affiliations

Randomization-Based Statistical Inference: A Resampling and Simulation Infrastructure

Ivo D Dinov et al. Teach Stat. 2018 Summer.

Abstract

Statistical inference involves drawing scientifically-based conclusions describing natural processes or observable phenomena from datasets with intrinsic random variation. There are parametric and non-parametric approaches for studying the data or sampling distributions, yet few resources are available to provide integrated views of data (observed or simulated), theoretical concepts, computational mechanisms and hands-on utilization via flexible graphical user interfaces. We designed, implemented and validated a new portable randomization-based statistical inference infrastructure (http://socr.umich.edu/HTML5/Resampling_Webapp) that blends research-driven data analytics and interactive learning, and provides a backend computational library for managing large amounts of simulated or user-provided data. The core of this framework is a modern randomization webapp, which may be invoked on any device supporting a JavaScript-enabled web-browser. We demonstrate the use of these resources to analyze proportion, mean, and other statistics using simulated (virtual experiments) and observed (e.g., Acute Myocardial Infarction, Job Rankings) data. Finally, we draw parallels between parametric inference methods and their distribution-free alternatives. The Randomization and Resampling webapp can be used for data analytics, as well as for formal, in-class and informal, out-of-the-classroom learning and teaching of different scientific concepts. Such concepts include sampling, random variation, computational statistical inference and data-driven analytics. The entire scientific community may utilize, test, expand, modify or embed these resources (data, source-code, learning activity, webapp) without any restrictions.

Keywords: Statistics Online Computational Resource (SOCR); bootstrapping; randomization; resampling; simulation; statistical inference.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Binomial simulation for estimating the probability that a 20-trial dichotomous experiment, with probability of success equal to that of failure, would generate 15 or more success outcomes, P(X ≥ 15) ≈ 0.02 .
Figure 2
Figure 2
Resampling-based inference results based on K=5,000 simulations. The p-value of the randomization test is approximately equal to zero (F2,K = 150.18, p0 ), which indicates that there are significant differences between the cardio-vascular mortality rates for males and females in this population.
Figure 3
Figure 3
Experiment 1 (exploratory use-case): generating data, performing simulations and completing statistical inference.
Figure 4
Figure 4
Experiment 2 (explanatory use-case): statistical inference on observed data. This case-study is based on the SOCR human weight and height dataset (SOCR, 2014f) . Once the data is copy-pasted into the webapp data table, we selected 2 random groups of weight measures (n1 = 20 and n2 = 37 ). However, these settings could be changed depending on the need of the data-driven study. The resampling-based inference indicates that the 2 groups are not different in terms of their mean weights (see orange bar on insert image, which indicates the differences of the mean weights in the original samples, relative to the resampling distribution of differences of randomized group mean weights, blue histogram plot).

References

    1. Al-Aziz J, Christou N, Dinov I. SOCR Motion Charts: An Efficient, Open-Source, Interactive and Dynamic Applet for Visualizing Longitudinal Multivariate Data. JSE. 2010;18(3):1–29. - PMC - PubMed
    1. Aronow P, Samii C. RI: R package for performing randomization-based inference for experiments. 2014 Retrieved from http://cran.r-project.org/web/packages/ri/ri.pdf.
    1. Barber JA, Thompson SG. Analysis of cost data in randomized trials: an application of the non-parametric bootstrap. Statistics in Medicine. 2000;19(23):3219–3236. - PubMed
    1. Barker T. Pro Data Visualization using R and JavaScript. Springer; 2013. Data Visualization with D3; pp. 65–84.
    1. Budgett S, Pfannkuch M, Regan M, Wild CJ. Dynamic visualizations and the randomization test. Technology Innovations in Statistics Education. 2013;7(2)