Descriptive statistics and visualization of data from the R datasets package with implications for clusterability
- PMID: 31317060
- PMCID: PMC6612012
- DOI: 10.1016/j.dib.2019.104004
Descriptive statistics and visualization of data from the R datasets package with implications for clusterability
Abstract
The manuscript describes and visualizes datasets from the datasets package in the R statistical software, focusing on descriptive statistics and visualizations that provide insights into the clusterability of these datasets. These publicly available datasets are contained in the R software system, and can be downloaded at https://www.r-project.org/, with documentation provided at https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html. Further information on clusterability is found in the companion to this article, To Cluster or Not to Cluster: An Analysis of Clusterability Methods? (https://doi.org/10.1016/j.patcog.2018.10.026). Brief descriptions and graphs of the variables contained in each dataset are provided in the form of means, extrema, quartiles, standard deviation and standard error. Two-dimensional plots for each pair of variables are provided. Original references to the data sets are included when available. Further, each dataset is reduced to a single dimension by each of two different methods: pairwise distances and principal component analysis. For the latter, only the first component is used. Histograms of the reduced data are included for every dataset using both methods.
Keywords: Datasets; Dimension reduction; Histograms; Pairwise distances; Principal component analysis.
Figures


















References
-
- Adolfsson A., Ackerman M., Brownstein N.C. To cluster, or not to cluster: an analysis of clusterability methods. Pattern Recogn. 2018;88:13–26. doi: 10.1016/j.patcog.2018.10.026. - DOI
-
- Azzalini A., Bowman A.W. A look at some data on the old faithful geyser. Appl. Stat. 1990:357–365.
-
- Chatterjee S., Price B. John Wiley & Sons; 1991. Regression Analysis by Example.
-
- Ezekiel M. vol. 427. 1930. (Methods of Correlation Analysis). New York and London.
-
- Fisher R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936;7(2):179–188.
LinkOut - more resources
Full Text Sources