Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 15;37(23):3357-3372.
doi: 10.1002/sim.7825. Epub 2018 Jun 19.

Multisample adjusted U-statistics that account for confounding covariates

Affiliations

Multisample adjusted U-statistics that account for confounding covariates

Glen A Satten et al. Stat Med. .

Abstract

Multisample U-statistics encompass a wide class of test statistics that allow the comparison of 2 or more distributions. U-statistics are especially powerful because they can be applied to both numeric and nonnumeric data, eg, ordinal and categorical data where a pairwise similarity or distance-like measure between categories is available. However, when comparing the distribution of a variable across 2 or more groups, observed differences may be due to confounding covariates. For example, in a case-control study, the distribution of exposure in cases may differ from that in controls entirely because of variables that are related to both exposure and case status and are distributed differently among case and control participants. We propose to use individually reweighted data (ie, using the stratification score for retrospective data or the propensity score for prospective data) to construct adjusted U-statistics that can test the equality of distributions across 2 (or more) groups in the presence of confounding covariates. Asymptotic normality of our adjusted U-statistics is established and a closed form expression of their asymptotic variance is presented. The utility of our approach is demonstrated through simulation studies, as well as in an analysis of data from a case-control study conducted among African-Americans, comparing whether the similarity in haplotypes (ie, sets of adjacent genetic loci inherited from the same parent) occurring in a case and a control participant differs from the similarity in haplotypes occurring in 2 control participants.

Keywords: adjusted U-statistics; multiple group comparison; propensity score.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
Comparison of empirical p-values and theoretical (uniform) p-values for the Kruskal-Wallis type test (Panel A) and Jonckheere-Terpstra type test (Panel B). Brown (long-dashed curve) corresponds to standardization to the study population, blue (dotted curve) is standardization to the group 1, red (solid curve) is the parametric model, and black (dash-dotted curve) is the naive U-statistic that does not account for confounding.
FIGURE 2
FIGURE 2
Power of the adjusted U-statistic for the Kruskal-Wallis type test (Panel A) and Jonckheere-Terpstra type test (Panel B) when the parametric model is correctly specified, and the power of the adjusted U-statistic for the Kruskal-Wallis type test (Panel C) and Jonckheere-Terpstra type test (Panel D) when the parametric model is mis-specified. Solid curve is the Wald test for the parametric model. Long-dashed and dotted curves are adjusted U-statistics that standardize to the study population and group 1, respectively. The parameter α determines strength of the association. When α = 0 (no association) the power corresponds to the size of the test. All tests have 3 groups and 2 degrees of freedom.
FIGURE 3
FIGURE 3
Expected vs. empirical p-values under null hypothesis using Kernel (12) and simulation data based on the COMPT study.

Similar articles

Cited by

References

    1. Mann HB & Whitney DR (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics 18, 50–60.
    1. Hoeffding W (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist 19, 293–325.
    1. Chen HS, Zhu X, Zhao H & Zhang S (2003). Qualitative semi-parametric test for genetic associations in case-control designs under structured populations. Ann Hum Genet 67, 250–264. - PubMed
    1. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, & Reich D(2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38, 904–909. - PubMed
    1. Allen AS & Satten GA (2011). Control for confounding in case-control studies using the stratification score, a retrospective balancing score. Am J Epidemiol 173, 752–760. - PMC - PubMed

Publication types

Substances

LinkOut - more resources