. 2018 Oct 15;37(23):3357-3372.

doi: 10.1002/sim.7825. Epub 2018 Jun 19.

Multisample adjusted U-statistics that account for confounding covariates

Glen A Satten¹, Maiying Kong², Somnath Datta³

Affiliations

¹ Division of Reproductive Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
² Department of Bioinformatics and Biostatistics, SPHIS, University of Louisville, Louisville, Kentucky, USA.
³ Department of Biostatistics, University of Florida, Gainesville, Florida, USA.

PMID: 29923344
PMCID: PMC6322553
DOI: 10.1002/sim.7825

Multisample adjusted U-statistics that account for confounding covariates

Glen A Satten et al. Stat Med. 2018.

. 2018 Oct 15;37(23):3357-3372.

doi: 10.1002/sim.7825. Epub 2018 Jun 19.

Authors

Glen A Satten¹, Maiying Kong², Somnath Datta³

Affiliations

¹ Division of Reproductive Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
² Department of Bioinformatics and Biostatistics, SPHIS, University of Louisville, Louisville, Kentucky, USA.
³ Department of Biostatistics, University of Florida, Gainesville, Florida, USA.

PMID: 29923344
PMCID: PMC6322553
DOI: 10.1002/sim.7825

Abstract

Multisample U-statistics encompass a wide class of test statistics that allow the comparison of 2 or more distributions. U-statistics are especially powerful because they can be applied to both numeric and nonnumeric data, eg, ordinal and categorical data where a pairwise similarity or distance-like measure between categories is available. However, when comparing the distribution of a variable across 2 or more groups, observed differences may be due to confounding covariates. For example, in a case-control study, the distribution of exposure in cases may differ from that in controls entirely because of variables that are related to both exposure and case status and are distributed differently among case and control participants. We propose to use individually reweighted data (ie, using the stratification score for retrospective data or the propensity score for prospective data) to construct adjusted U-statistics that can test the equality of distributions across 2 (or more) groups in the presence of confounding covariates. Asymptotic normality of our adjusted U-statistics is established and a closed form expression of their asymptotic variance is presented. The utility of our approach is demonstrated through simulation studies, as well as in an analysis of data from a case-control study conducted among African-Americans, comparing whether the similarity in haplotypes (ie, sets of adjacent genetic loci inherited from the same parent) occurring in a case and a control participant differs from the similarity in haplotypes occurring in 2 control participants.

Keywords: adjusted U-statistics; multiple group comparison; propensity score.

PubMed Disclaimer

Figures

**FIGURE 1**
Comparison of empirical p-values and theoretical (uniform) p-values for the Kruskal-Wallis type test (Panel A) and Jonckheere-Terpstra type test (Panel B). Brown (long-dashed curve) corresponds to standardization to the study population, blue (dotted curve) is standardization to the group 1, red (solid curve) is the parametric model, and black (dash-dotted curve) is the naive U-statistic that does not account for confounding.

**FIGURE 2**
Power of the adjusted U-statistic for the Kruskal-Wallis type test (Panel A) and Jonckheere-Terpstra type test (Panel B) when the parametric model is correctly specified, and the power of the adjusted U-statistic for the Kruskal-Wallis type test (Panel C) and Jonckheere-Terpstra type test (Panel D) when the parametric model is mis-specified. Solid curve is the Wald test for the parametric model. Long-dashed and dotted curves are adjusted U-statistics that standardize to the study population and group 1, respectively. The parameter α determines strength of the association. When α = 0 (no association) the power corresponds to the size of the test. All tests have 3 groups and 2 degrees of freedom.

**FIGURE 3**
Expected vs. empirical p-values under null hypothesis using Kernel (12) and simulation data based on the COMPT study.

See this image and copyright information in PMC

Cited by

Chronic Pain Severity and Sociodemographics: An Evaluation of the Neurobiological Interface.
Tanner JJ, Cardoso J, Terry EL, Booker SQ, Glover TL, Garvan C, Deshpande H, Deutsch G, Lai S, Staud R, Addison A, Redden D, Goodin BR, Price CC, Fillingim RB, Sibille KT. Tanner JJ, et al. J Pain. 2022 Feb;23(2):248-262. doi: 10.1016/j.jpain.2021.07.010. Epub 2021 Aug 21. J Pain. 2022. PMID: 34425249 Free PMC article.
Statistical methods for assessing treatment effects on ordinal outcomes using observational data.
Hu H, Zheng Q, Kong M. Hu H, et al. Commun Stat Simul Comput. 2025 Apr 14:10.1080/03610918.2025.2488945. doi: 10.1080/03610918.2025.2488945. Online ahead of print. Commun Stat Simul Comput. 2025. PMID: 40857455
Testing hypotheses about the microbiome using the linear decomposition model (LDM).
Hu YJ, Satten GA. Hu YJ, et al. Bioinformatics. 2020 Aug 15;36(14):4106-4115. doi: 10.1093/bioinformatics/btaa260. Bioinformatics. 2020. PMID: 32315393 Free PMC article.
The dynamics in food selection stemming from price awareness and perceived income adequacy: a cross-sectional study using 1-year loyalty card data.
Fogelholm M, Vepsäläinen H, Meinilä J, McRae C, Saarijärvi H, Erkkola M, Nevalainen J. Fogelholm M, et al. Am J Clin Nutr. 2024 May;119(5):1346-1353. doi: 10.1016/j.ajcnut.2024.03.003. Epub 2024 Mar 7. Am J Clin Nutr. 2024. PMID: 38458401 Free PMC article.
ULV: A robust statistical method for clustered data, with applications to multi-subject, single-cell omics data.
Du M, Johnston K, Berrocal V, Li W, Xu X, Yu Z. Du M, et al. ArXiv [Preprint]. 2024 Jun 10:arXiv:2406.06767v1. ArXiv. 2024. PMID: 38947924 Free PMC article. Preprint.

References

1. Mann HB & Whitney DR (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 50–60.
1. Hoeffding W (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist 19, 293–325.
1. Chen HS, Zhu X, Zhao H & Zhang S (2003). Qualitative semi-parametric test for genetic associations in case-control designs under structured populations. Ann Hum Genet 67, 250–264. - PubMed
1. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, & Reich D(2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904–909. - PubMed
1. Allen AS & Satten GA (2011). Control for confounding in case-control studies using the stratification score, a retrospective balancing score. Am J Epidemiol 173, 752–760. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multisample adjusted U-statistics that account for confounding covariates

Affiliations

Multisample adjusted U-statistics that account for confounding covariates

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous