Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May;133(5):639-50.
doi: 10.1007/s00439-013-1401-5. Epub 2013 Dec 13.

Improving genetic risk prediction by leveraging pleiotropy

Affiliations

Improving genetic risk prediction by leveraging pleiotropy

Cong Li et al. Hum Genet. 2014 May.

Abstract

An important task of human genetics studies is to predict accurately disease risks in individuals based on genetic markers, which allows for identifying individuals at high disease risks, and facilitating their disease treatment and prevention. Although hundreds of genome-wide association studies (GWAS) have been conducted on many complex human traits in recent years, there has been only limited success in translating these GWAS data into clinically useful risk prediction models. The predictive capability of GWAS data is largely bottlenecked by the available training sample size due to the presence of numerous variants carrying only small to modest effects. Recent studies have shown that different human traits may share common genetic bases. Therefore, an attractive strategy to increase the training sample size and hence improve the prediction accuracy is to integrate data from genetically correlated phenotypes. Yet, the utility of genetic correlation in risk prediction has not been explored in the literature. In this paper, we analyzed GWAS data for bipolar and related disorders and schizophrenia with a bivariate ridge regression method, and found that jointly predicting the two phenotypes could substantially increase prediction accuracy as measured by the area under the receiver operating characteristic curve. We also found similar prediction accuracy improvements when we jointly analyzed GWAS data for Crohn's disease and ulcerative colitis. The empirical observations were substantiated through our comprehensive simulation studies, suggesting that a gain in prediction accuracy can be obtained by combining phenotypes with relatively high genetic correlations. Through both real data and simulation studies, we demonstrated pleiotropy can be leveraged as a valuable asset that opens up a new opportunity to improve genetic risk prediction in the future.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
Prediction accuracy of different methods on the BARD-SZ data. “BVR”: bivariate ridge regression; “UVR”: univariate ridge regression. The numbers in the brackets are the mean AUCs achieved by each method in the 50 repeats.
Fig. 2
Fig. 2
Prediction accuracy of the bivariate ridge regression after shuffling the SNP identities and of ridge regression. Red plots represent the results of bivariate ridge regression and blue plots represent those of ridge regression. The percentage below each red plot represents the fraction of the SNPs that were shuffled.
Fig. 3
Fig. 3
The prediction accuracy of bivariate and univariate ridge regression for BARD and SZ with subsamples.
Fig. 4
Fig. 4
Prediction accuracy of different methods on the CD-UC data. “BVR” and “UVR” are defined as in Figure 1. The numbers in the brackets are the mean AUCs achieved by each method in the 50 repeats.
Fig. 5
Fig. 5
Prediction accuracy of the bivariate ridge regression after shuffling the SNP identities and of ridge regression. Red plots represent the results of bivariate ridge regression and blue plots represent those of the ridge regression. The percentage below each red plot represents the fraction of the SNPs that were shuffled.
Fig. 6
Fig. 6
The prediction accuracy of bivariate and univariate ridge regression for CD and UC with subsamples.
Fig. 7
Fig. 7
Simulation results for the case when the two diseases have equal sample sizes and h2 levels and m = 1000. “BVR” and “UVR” are defined as in Figure 1. Two h2 levels (0.3 and 0.6) and two sample sizes (1000 and 2000) were simulated. The proportion of shared causal SNPs, γ was varied from 0 to 1 with an increment of 0.25. The numbers below the UVR box plots are the sample sizes. Following the UVR box plots are the box plots representing the results of BVR with the same sample sizes at different γ values (below the BVR box plots).
Fig. 8
Fig. 8
Simulation results for the case when the two diseases have unequal sample sizes and equal h2 levels and m = 1000. “BVR” and “UVR” are defined as in Figure 1. One of the diseases has 2000 samples and the other has 1000 samples. Two h2 levels (0.6 and 0.3) were simulated. The proportion of shared causal SNPs, γ was varied from 0 to 1 with an increment of 0.25. The numbers below the UVR box plots are the h2 levels. Following the UVR box plots are the results of BVR with the same h2 levels at different γ values (below the BVR box plots).
Fig. 9
Fig. 9
Simulation results for the case when the two diseases have equal sample sizes and unequal h2 levels and m = 1000. “BVR” and “UVR” are defined as in Figure 1. One of the diseases has h2 = 0.6 and the other has h2 = 0.3. Two sample sizes (2000 and 1000) were simulated. The proportion of shared causal SNPs, γ was varied from 0 to 1 with an increment of 0.25. The numbers below the UVR box plots are the sample sizes. Following the UVR box plots are the results of BVR with the same sample sizes at different γ values (below the BVR box plots).

References

    1. Andreassen OA, Djurovic S, Thompson WK, Schork AJ, Kendler KS, O’Donovan MC, Rujescu D, Werge T, van de Bunt M, Morris AP, et al. Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. The American Journal of Human Genetics. 2013;92(2):197–209. - PMC - PubMed
    1. Brown PJ, Zidek JV. Adaptive multivariate ridge regression. The Annals of Statistics. 1980;8(1):64–74.
    1. Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ, et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447(7145):661–678. - PMC - PubMed
    1. de los Campos G, Gianola D, Allison D. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nature Reviews Genetics. 2010;11(12):880–886. - PubMed
    1. Clarke AJ, Cooper DN. GWAS: heritability missing in action? European Journal of Human Genetics. 2010;18(8):859–861. - PMC - PubMed

Publication types

Grants and funding