Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct;4(10):e1000231.
doi: 10.1371/journal.pgen.1000231. Epub 2008 Oct 24.

Predicting unobserved phenotypes for complex traits from whole-genome SNP data

Affiliations

Predicting unobserved phenotypes for complex traits from whole-genome SNP data

Sang Hong Lee et al. PLoS Genet. 2008 Oct.

Abstract

Genome-wide association studies (GWAS) for quantitative traits and disease in humans and other species have shown that there are many loci that contribute to the observed resemblance between relatives. GWAS to date have mostly focussed on discovery of genes or regulatory regions habouring causative polymorphisms, using single SNP analyses and setting stringent type-I error rates. Genome-wide marker data can also be used to predict genetic values and therefore predict phenotypes. Here, we propose a Bayesian method that utilises all marker data simultaneously to predict phenotypes. We apply the method to three traits: coat colour, %CD8 cells, and mean cell haemoglobin, measured in a heterogeneous stock mouse population. We find that a model that contains both additive and dominance effects, estimated from genome-wide marker data, is successful in predicting unobserved phenotypes and is significantly better than a prediction based upon the phenotypes of close relatives. Correlations between predicted and actual phenotypes were in the range of 0.4 to 0.9 when half of the number of families was used to estimate effects and the other half for prediction. Posterior probabilities of SNPs being associated with coat colour were high for regions that are known to contain loci for this trait. The prediction of phenotypes using large samples, high-density SNP data, and appropriate statistical methodology is feasible and can be applied in human medicine, forensics, or artificial selection programs.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Correlation between predicted and actual phenotype for coat colour.
Results from the additive and dominance genetic model and inter-family prediction when using each chromosome at a time (vertical bars), and when using whole genome information (horizontal line, 0.88).
Figure 2
Figure 2. Correlation between predicted and actual phenotype for %CD8.
Results from the additive and dominance genetic model and inter-family prediction when using each chromosome at a time (vertical bars), and when using whole genome information (horizontal line, 0.63).
Figure 3
Figure 3. Correlation between predicted and actual phenotype for MCH.
Results from the additive and dominance genetic model and inter-family prediction when using each chromosome at a time (vertical bars), and when using whole genome information (horizontal line, 0.4).
Figure 4
Figure 4. Posterior density of association of SNPs for coat colour using the whole-genome approach (A, C, E) or Likelihood Ratio of single SNP regression (B, D, F).
For comparison, the positions of known genes for coat colour are shown (diamonds).
Figure 5
Figure 5. Posterior density of association of SNPs for %CD8 using the whole genome approach (A, C, E) or Likelihood Ratio of single SNP regression (B, D, F).
For comparison, the positions of known genes for %CD8 are shown (diamonds).
Figure 6
Figure 6. Posterior density of association of SNPs for MCH using the whole genome approach (A, C, E) or Likelihood Ratio of single SNP regression (B, D, F).
For comparison, the positions of known genes for MCH are shown (diamonds).
Figure 7
Figure 7. Convergence diagnostics for the values of the accuracy of predicting unobserved phenotypes.

References

    1. Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, et al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet. 2008;40:161–169. - PMC - PubMed
    1. Sanna S, Jackson AU, Nagaraja R, Willer CJ, Chen WM, et al. Common variants in the GDF5-UQCC region are associated with variation in human height. Nat Genet. 2008;40:198–203. - PMC - PubMed
    1. Harley JB, Alarcon-Riquelme ME, Criswell LA, Jacob CO, Kimberly RP, et al. Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nat Genet. 2008;40:204–210. - PMC - PubMed
    1. Zanke BW, Greenwood CM, Rangrej J, Kustra R, Tenesa A, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet. 2007;39:989–994. - PubMed
    1. Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007;39:645–649. - PubMed

Publication types