Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr;48(4):466-72.
doi: 10.1038/ng.3513. Epub 2016 Feb 22.

A multiple-phenotype imputation method for genetic studies

Affiliations

A multiple-phenotype imputation method for genetic studies

Andrew Dahl et al. Nat Genet. 2016 Apr.

Abstract

Genetic association studies have yielded a wealth of biological discoveries. However, these studies have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of the data sets. Joint genotype-phenotype analyses of complex, high-dimensional data sets represent an important way to move beyond simple genome-wide association studies (GWAS) with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. Here we address the central issue of missing phenotypes in studies with any level of relatedness between samples. We propose a multiple-phenotype mixed model and use a computationally efficient variational Bayesian algorithm to fit the model. On a variety of simulated and real data sets from a range of organisms and trait types, we show that our method outperforms existing state-of-the-art methods from the statistics and machine learning literature and can boost signals of association.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Simulation results
Model 1 – scenario simulated using an empirical kinship matrix derived from the human NSPHS study. Model 2 – scenario simulated using 75 families of 4 sibs. Datasets were simulated at various levels of heritability (x-axis) for the traits. 300 individuals at 15 traits were simulated. 5% of phenotype values were set as missing before imputation. 7 different methods (legend) were applied to impute the missing values. The correlation of the imputed values with the true values is plotted on the y-axis for each method. The lines for TRCMA, MVN and SOFTIMPUTE lie almost exactly on top of each other.
Figure 2
Figure 2. Imputation performance in real datasets
There is one plot for each of the six real datasets. The vertical dotted black line shows the true level of missingness in the dataset. Extra missingness was added to each dataset , and the x-axis shows the amount of missing data in these reduced datasets. The y-axis shows imputation correlation between the imputed missing data and the held out data. The legend denotes the different methods that were applied to the datasets. Not all methods were run on all datasets. TRCMA and MPMM were only run on the human NSPHS and wheat datasets for computational reasons.
Figure 3
Figure 3. Missing phenotype imputation in 140 rat GWAS
The x-axis and y-axis show the −log10(p) for the GWAS on the un-imputed and imputed phenotypes respectively. Each point corresponds to a region in both scans. The dashed black lines denote a conservative threshold of −log10(p)>10 that was applied to highlight associated regions (large points). Points in grey have imputation r2<0.36. Associations with platelet phenotypes on chr 9 and T cell phenotypes on chr 10 are highlighted with red and blue points respectively.
Figure 4
Figure 4. Platelet phenotype associations
GWAS results for un-imputed (blue points) and imputed phenotypes (red points) for three platelet phenotypes (MPC, MPV, PDW) measured in rats, on rat chromosome 9 (50-80Mb). Genes are shown below the plots, with some (named) genes with relevant annotation to platelet function, adhesion and aggregation highlighted in a separate track. Histograms on the right show the distribution of observed (cyan) and imputed (purple) phenotypes, together with missingness and r2 metrics.
Figure 5
Figure 5. T cell phenotype associations
GWAS results for un-imputed (blue points) and imputed phenotypes (red points) for three T cell phenotypes (CD25highCD4, Abs_CD25CD8, pctDP) measured in rats, on rat chromosome 10 (83-89Mb). Genes are shown below the plots, with some (named) genes with relevant annotation to T cell phenotypes highlighted in a separate track. Histograms on the right show the distribution of observed (cyan) and imputed (purple) phenotypes, together with missingness and r2 metrics.

References

    1. Marx V. Human phenotyping on a population scale. Nat. Methods. 2015;12:711–714. - PubMed
    1. Soranzo N, et al. A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nat. Genet. 2009;41:1182–1190. - PMC - PubMed
    1. Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods. 2014;11:407–409. - PMC - PubMed
    1. Huffman JE, et al. Polymorphisms in B3GAT1, SLC9A9 and MGAT5 are associated with variation within the human plasma N-glycome of 3533 European adults. Hum. Mol. Genet. 2011;20:5000–5011. - PubMed
    1. Lauc G, et al. Genomics meets glycomics-the first GWAS study of human N-Glycome identifies HNF1α as a master regulator of plasma protein fucosylation. PLoS Genet. 2010;6:e1001256. - PMC - PubMed

Publication types

LinkOut - more resources