Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec 4;95(6):675-88.
doi: 10.1016/j.ajhg.2014.11.005.

Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos

Affiliations

Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos

Dan-Yu Lin et al. Am J Hum Genet. .

Abstract

The cohort design allows investigators to explore the genetic basis of a variety of diseases and traits in a single study while avoiding major weaknesses of the case-control design. Most cohort studies employ multistage cluster sampling with unequal probabilities to conveniently select participants with desired characteristics, and participants from different clusters might be genetically related. Analysis that ignores the complex sampling design can yield biased estimation of the genetic association and inflation of the type I error. Herein, we develop weighted estimators that reflect unequal selection probabilities and differential nonresponse rates, and we derive variance estimators that properly account for the sampling design and the potential relatedness of participants in different sampling units. We compare, both analytically and numerically, the performance of the proposed weighted estimators with unweighted estimators that disregard the sampling design. We demonstrate the usefulness of the proposed methods through analysis of MetaboChip data in the Hispanic Community Health Study/Study of Latinos, which is the largest health study of the Hispanic/Latino population in the United States aimed at identifying risk factors for various diseases and determining the role of genes and environment in the occurrence of diseases. We provide guidelines on the use of weighted and unweighted estimators, as well as the relevant software.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulation Results under the Null Hypothesis Bias, standard error, mean standard error estimate, and type I error (divided by the nominal significance level 0.001) for weighted and unweighted methods as a function of the correlation between the sampling variable and the genotype when the correlation between the sampling variable and the trait of interest is 0.2 (left side) and as a function of the correlation between the sampling variable and the trait of interest when the correlation between the sampling variable and the genotype is 0.2 (right side). The estimates of the bias and type I error are indistinguishable between W-HT and W-PS.
Figure 2
Figure 2
Simulation Results under the Alternative Hypothesis Bias, standard error, mean standard error estimate, and power (at the nominal significance level of 0.001) for weighted and unweighted methods as a function of the correlation between the sampling variable and the genotype when the correlation between the sampling variable and the trait of interest is 0.2 (left side) and as a function of the correlation between the sampling variable and the trait of interest when the correlation between the sampling variable and the genotype is 0.2 (right side). The estimates of the bias are indistinguishable between W-HT and W-PS.
Figure 3
Figure 3
Simulation Results under Misspecified Models Bias, standard error, and mean square error for weighted and unweighted methods as a function of the interaction between the genotype and age when the correlation between the sampling variable and the trait of interest is 0.2 (left side) and as a function of the correlation between the sampling variable and the trait of interest when the interaction between the genotype and age is 0.005 (right side). The estimates of the bias are indistinguishable among W-PS, UW-M, UW-C, and UW-P, and the estimates of the mean square error are indistinguishable among UW-M, UW-C, and UW-P.
Figure 4
Figure 4
Manhattan Plots from the Genome-wide Association Analysis of BMI in the HCHS/SOL Plots of −log10(p values) for weighted and unweighted methods with robust versus model-based variance estimators are shown. The log-transformation was applied to BMI. SNPs with MAF < 0.01% were excluded. The Bonferroni threshold for genome-wide significance is indicated by the dashed line.
Figure 5
Figure 5
Quantile-Quantile Plots from the Genome-wide Association Analysis of BMI, Fasting Glucose, and Total Cholesterol in the HCHS/SOL Quantile-quantile plots of −log10(p values) for weighted and unweighted methods with model-based variance estimators are shown. The log-transformation was applied to BMI and total cholesterol, and the inverse normal transformation was applied to fasting glucose. SNPs with MAF < 1% were excluded. Most of the p values are indistinguishable between UW-M and UW-P.
Figure 6
Figure 6
Forest Plots for Four Known BMI Loci in the HCHS/SOL The effect estimates and 95% confidence intervals for weighted and unweighted methods with robust variance estimators are shown for the younger age group (young), older age group (old), and all individuals (all). The log-transformation was applied to BMI.

Comment in

References

    1. Collins F.S. The case for a US prospective cohort study of genes and environment. Nature. 2004;429:475–477. - PubMed
    1. Manolio T.A., Bailey-Wilson J.E., Collins F.S. Genes, environment and the value of prospective cohort studies. Nat. Rev. Genet. 2006;7:812–820. - PubMed
    1. Manolio T.A. Cohort studies and the genetics of complex disease. Nat. Genet. 2009;41:5–6. - PubMed
    1. Higgins M., Province M., Heiss G., Eckfeldt J., Ellison R.C., Folsom A.R., Rao D.C., Sprafka J.M., Williams R. NHLBI Family Heart Study: objectives and design. Am. J. Epidemiol. 1996;143:1219–1228. - PubMed
    1. Löwel H., Döring A., Schneider A., Heier M., Thorand B., Meisinger C., MONICA/KORA Study Group The MONICA Augsburg surveys—basis for prospective cohort studies. Gesundheitswesen. 2005;67(1):S13–S18. - PubMed

Publication types