Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Oct;192(2):693-704.
doi: 10.1534/genetics.112.141143. Epub 2012 Jul 18.

Inferences from genomic models in stratified populations

Affiliations

Inferences from genomic models in stratified populations

Luc Janss et al. Genetics. 2012 Oct.

Abstract

Unaccounted population stratification can lead to spurious associations in genome-wide association studies (GWAS) and in this context several methods have been proposed to deal with this problem. An alternative line of research uses whole-genome random regression (WGRR) models that fit all markers simultaneously. Important objectives in WGRR studies are to estimate the proportion of variance accounted for by the markers, the effect of individual markers, prediction of genetic values for complex traits, and prediction of genetic risk of diseases. Proposals to account for stratification in this context are unsatisfactory. Here we address this problem and describe a reparameterization of a WGRR model, based on an eigenvalue decomposition, for simultaneous inference of parameters and unobserved population structure. This allows estimation of genomic parameters with and without inclusion of marker-derived eigenvectors that account for stratification. The method is illustrated with grain yield in wheat typed for 1279 genetic markers, and with height, HDL cholesterol and systolic blood pressure from the British 1958 cohort study typed for 1 million SNP genotypes. Both sets of data show signs of population structure but with different consequences on inferences. The method is compared to an advocated approach consisting of including eigenvectors as fixed-effect covariates in a WGRR model. We show that this approach, used in the context of WGRR models, is ill posed and illustrate the advantages of the proposed model. In summary, our method permits a unified approach to the study of population structure and inference of parameters, is computationally efficient, and is easy to implement.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Human data. Lag-x, x = 1, 2, … 120, linkage disequilibrium (average squared correlation between SNP genotypes). For adjacent loci, x = 1, and x = 120 indicates that loci are 120 genotypes apart.
Figure 2
Figure 2
Posterior means of marker effects (y-axis) obtained using expression (30) vs. marker loci, labeled from 1 to total number (x-axis) for the three traits. Black regions correspond to effects estimated with the original data, and shaded regions correspond to effects estimated from data in which the rows of matrix W were reshuffled and therefore randomized with respect to the phenotypes and their conditional means μi.
Figure 3
Figure 3
Left: The first vs. the second largest axes of variation in wheat. Middle: The first vs. the second largest axes of variation for the human marker data. Right: The third vs. the second largest axes of variation for the human marker data.
Figure 4
Figure 4
Proportion of variance explained by the eigenvectors given in (16) in the y-axis for increasing number of eigenvectors d. Left: Wheat data. Right: Human data.
Figure 5
Figure 5
Human data. Red: Posterior means of within population genomic heritability in the y-axis [expression 20)] computed using the WGRR model (14) after accounting for the proportion of variance due to the number of eigenvectors (d) with the largest eigenvalues, in the x-axis. Blue: Genomic heritability (8) computed using model (2) with the addition of the d eigenvectors with the largest eigenvalues treated as fixed effects. The horizontal dotted lines emphasize the range of values of the posterior means of hgw2 between d = 0 and d = 20, obtained with the WGRR model.
Figure 6
Figure 6
Wheat data. Red: Posterior mean of within population genomic heritability in the y-axis [expression 20)] computed using the WGRR model (14) after accounting for the proportion of variance due to the d eigenvectors with the largest eigenvalues, vs. d, in the x-axis. Blue: Genomic heritability (8) computed using model (2) with the addition of the d eigenvectors with the largest eigenvalues treated as fixed effects. The horizontal dotted lines emphasize the range of values of the posterior means of hgw2 between d = 0 and d = 20, obtained with the WGRR model.
Figure 7
Figure 7
Human data. Posterior means of SNP effects corrected for population substructure (y-axis, given by 21, with d = 20), vs. posterior means of SNP effects uncorrected for population substructure (x-axis, given by 30), for the three traits.
Figure 8
Figure 8
Monte Carlo estimates of posterior probabilities Pr(Hj1|y) defined in (23), in y-axis, against eigenvectors j, j = 1, 2, …, n, labeled in decreasing order according to the size of their eigenvalues, x-axis. Left and middle: Human data. Right: Wheat data.

References

    1. Albert J. H., Chib S., 1993. Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88: 669–679
    1. Anderson T. W., 1984. An Introduction to Multivariate Statistical Analysis. Wiley, New York
    1. Crossa J., de los Campos G., Perez P., Gianola D., Burgueño J., et al. 2010. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186: 713–724 - PMC - PubMed
    1. de los Campos G., Perez P., 2010. BLR: Bayesian linear regression. R package v. 1.2 (http://cran.r-project.org/web/packages/BLR/index.html) - PMC - PubMed
    1. de los Campos G., Naya H., Gianola D., Crossa J., Legarra A., et al. 2009. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182: 375–385 - PMC - PubMed

Publication types