Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;3(11):e3696.
doi: 10.1371/journal.pone.0003696. Epub 2008 Nov 11.

A confidence interval for the wallace coefficient of concordance and its application to microbial typing methods

Affiliations

A confidence interval for the wallace coefficient of concordance and its application to microbial typing methods

Francisco R Pinto et al. PLoS One. 2008.

Abstract

Very diverse research fields frequently deal with the analysis of multiple clustering results, which should imply an objective detection of overlaps and divergences between the formed groupings. The congruence between these multiple results can be quantified by clustering comparison measures such as the Wallace coefficient (W). Since the measured congruence is dependent on the particular sample taken from the population, there is variability in the estimated values relatively to those of the true population. In the present work we propose the use of a confidence interval (CI) to account for this variability when W is used. The CI analytical formula is derived assuming a Gaussian sampling distribution and recurring to the algebraic relationship between W and the Simpson's index of diversity. This relationship also allows the estimation of the expected Wallace value under the assumption of independence of classifications. We evaluated the CI performance using simulated and published microbial typing data sets. The simulations showed that the CI has the desired 95% coverage when the W is greater than 0.5. This behaviour is robust to changes in cluster number, cluster size distributions and sample size. The analysis of the published data sets demonstrated the usefulness of the new CI by objectively validating some of the previous interpretations, while showing that other conclusions lacked statistical support.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Coverage and amplitude of 95% confidence intervals for Wallace coefficient obtained from simulated classifications.
Each dot represents a simulation with a particular set of parameters. The colors indicate the dimensions of the simulated contingency tables as indicated in the figure legend, which correspond to the number of clusters in each of the two classifications. All simulated tables in this plot had n = 300 elements and the distribution of row cluster sizes followed a Zipfian distribution with exponent a = 1.
Figure 2
Figure 2. Coverage and amplitude of 95% confidence intervals for Wallace coefficient obtained from simulated classifications.
Each dot represents a simulation with a particular set of parameters. The colors indicate the number of elements n of the simulated contingency tables as indicated in the figure legend. All simulated tables in this plot had 10×10 dimensions and the distribution of row cluster sizes followed a Zipfian distribution with exponent a = 1.
Figure 3
Figure 3. Coverage and amplitude of 95% confidence intervals for Wallace coefficient obtained from simulated classifications.
Each dot represents a simulation with a particular set of parameters. The colors indicate exponent a of the Zipfian distribution determining the distribution of row cluster sizes of the simulated contingency tables as indicated in the figure legend. All simulated tables in this plot had n = 300 elements and 10×10 dimensions.

Similar articles

Cited by

References

    1. Pinto FR, Carrico JA, Ramirez M, Almeida JS. Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement. BMC Bioinformatics. 2007;8:44. - PMC - PubMed
    1. Meila M. Comparing clusterings by the variation of information. 2003. pp. 173–187. LEARNING THEORY AND KERNEL MACHINES.
    1. Steinley D. Properties of the Hubert-Arabie adjusted Rand index. Psychol Methods. 2004;9:386–396. - PubMed
    1. Camiz S, Pillar V. Comparison of single and complete linkage clustering with the hierarchical factor classification of variables. Community Ecology. 2007;8:25–30.
    1. Wallace DL. Comment. Journal of the American Statistical Association. 1983;78:569–576.

Publication types