Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;20(1):100-120.
doi: 10.1007/s13253-014-0180-3. Epub 2014 Dec 24.

Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting

Affiliations

Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting

Caroline Carrico et al. J Agric Biol Environ Stat. 2015 Mar.

Abstract

In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) suffer from collinearity and variance inflation, and shrinkage methods have limitations in selecting among correlated components. We propose a weighted quantile sum (WQS) approach to estimating a body burden index, which identifies "bad actors" in a set of highly correlated environmental chemicals. We evaluate and characterize the accuracy of WQS regression in variable selection through extensive simulation studies through sensitivity and specificity (i.e., ability of the WQS method to select the bad actors correctly and not incorrect ones). We demonstrate the improvement in accuracy this method provides over traditional ordinary regression and shrinkage methods (lasso, adaptive lasso, and elastic net). Results from simulations demonstrate that WQS regression is accurate under some environmentally relevant conditions, but its accuracy decreases for a fixed correlation pattern as the association with a response variable diminishes. Nonzero weights (i.e., weights exceeding a selection threshold parameter) may be used to identify bad actors; however, components within a cluster of highly correlated active components tend to have lower weights, with the sum of their weights representative of the set.

Keywords: Correlation; Nonlinear model; Subset selection; Variable selection; WQS.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Heat map of Spearman correlation estimates between the urinary phthalate monoesters (logscale transformed; N = 1439) as measured in the National Health and Nutrition Examination Survey (NHANES, 2005–2008).
Figure 2.
Figure 2.
Median number of correctly (solid) and incorrectly (dashed) selected variables over values of the selection threshold parameter for each of the four cases (Table 1) in the simulation study.
Figure 3.
Figure 3.
Weights across 100 simulation studies (Case A) similar to Case 1, with observed correlation pattern among the chemicals where the correlation between the outcome and each active component is 0.1; and (Case B) similar to Case 4 with the observed correlation pattern diminished by half and where the correlation between the outcome and the active component is 0.2. Histograms for active chemicals are green and for inactive chemicals are red.
Figure 3.
Figure 3.
Weights across 100 simulation studies (Case A) similar to Case 1, with observed correlation pattern among the chemicals where the correlation between the outcome and each active component is 0.1; and (Case B) similar to Case 4 with the observed correlation pattern diminished by half and where the correlation between the outcome and the active component is 0.2. Histograms for active chemicals are green and for inactive chemicals are red.
Figure 3.
Figure 3.
Weights across 100 simulation studies (Case A) similar to Case 1, with observed correlation pattern among the chemicals where the correlation between the outcome and each active component is 0.1; and (Case B) similar to Case 4 with the observed correlation pattern diminished by half and where the correlation between the outcome and the active component is 0.2. Histograms for active chemicals are green and for inactive chemicals are red.

References

    1. Billionnet C, Sherrill D, Annesi-Maesano I; GERIE Study (2012). Estimating the health effects of exposure to multi-pollutant mixture. Annals of Epidemiology 22(2): 126–141. - PubMed
    1. Breiman L (1996). Stacked regressions. Machine Learning 24:49–64.
    1. Brunekreef B Exposure science, the exposome, and public health. Environmental and molecular mutagenesis. February 26 2013. - PubMed
    1. Buck Louis GM, Yeung E, Sundaram R, Laughon SK, Zhang C. The exposome-exciting opportunities for discoveries in reproductive and perinatal epidemiology. Paediatr Perinat Epidemiol. May 2013;27(3):229–236. - PMC - PubMed
    1. Center for Disease Control. National Health and Nutrition Examination Study. http://www.cdc.gov/nchs/nhanes.htm.

LinkOut - more resources