A multiple-phenotype imputation method for genetic studies

Andrew Dahl¹, Valentina Iotchkova^{2

3}, Amelie Baud³, Åsa Johansson⁴, Ulf Gyllensten⁴, Nicole Soranzo², Richard Mott¹, Andreas Kranis^{5

6}, Jonathan Marchini^{1

7}

Affiliations

¹ Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
² Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
³ European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK.
⁴ Department of Immunology, Genetics and Pathology, Science for Life Laboratory Uppsala, Uppsala University, Uppsala, Sweden.
⁵ Aviagen, Ltd., Newbridge, UK.
⁶ Roslin Institute, University of Edinburgh, Midlothian, UK.
⁷ Department of Statistics, University of Oxford, Oxford, UK.

PMID: 26901065
PMCID: PMC4817234
DOI: 10.1038/ng.3513

A multiple-phenotype imputation method for genetic studies

Andrew Dahl et al. Nat Genet. 2016 Apr.

. 2016 Apr;48(4):466-72.

doi: 10.1038/ng.3513. Epub 2016 Feb 22.

Authors

Andrew Dahl¹, Valentina Iotchkova^{2

3}, Amelie Baud³, Åsa Johansson⁴, Ulf Gyllensten⁴, Nicole Soranzo², Richard Mott¹, Andreas Kranis^{5

6}, Jonathan Marchini^{1

7}

Affiliations

¹ Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
² Human Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK.
³ European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, UK.
⁴ Department of Immunology, Genetics and Pathology, Science for Life Laboratory Uppsala, Uppsala University, Uppsala, Sweden.
⁵ Aviagen, Ltd., Newbridge, UK.
⁶ Roslin Institute, University of Edinburgh, Midlothian, UK.
⁷ Department of Statistics, University of Oxford, Oxford, UK.

PMID: 26901065
PMCID: PMC4817234
DOI: 10.1038/ng.3513

Abstract

Genetic association studies have yielded a wealth of biological discoveries. However, these studies have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of the data sets. Joint genotype-phenotype analyses of complex, high-dimensional data sets represent an important way to move beyond simple genome-wide association studies (GWAS) with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. Here we address the central issue of missing phenotypes in studies with any level of relatedness between samples. We propose a multiple-phenotype mixed model and use a computationally efficient variational Bayesian algorithm to fit the model. On a variety of simulated and real data sets from a range of organisms and trait types, we show that our method outperforms existing state-of-the-art methods from the statistics and machine learning literature and can boost signals of association.

PubMed Disclaimer

Figures

**Figure 1. Simulation results**
**Model 1** – scenario simulated using an empirical kinship matrix derived from the human NSPHS study. **Model 2** – scenario simulated using 75 families of 4 sibs. Datasets were simulated at various levels of heritability (x-axis) for the traits. 300 individuals at 15 traits were simulated. 5% of phenotype values were set as missing before imputation. 7 different methods (legend) were applied to impute the missing values. The correlation of the imputed values with the true values is plotted on the y-axis for each method. The lines for TRCMA, MVN and SOFTIMPUTE lie almost exactly on top of each other.

**Figure 2. Imputation performance in real datasets**
There is one plot for each of the six real datasets. The vertical dotted black line shows the true level of missingness in the dataset. Extra missingness was added to each dataset , and the x-axis shows the amount of missing data in these reduced datasets. The y-axis shows imputation correlation between the imputed missing data and the held out data. The legend denotes the different methods that were applied to the datasets. Not all methods were run on all datasets. TRCMA and MPMM were only run on the human NSPHS and wheat datasets for computational reasons.

**Figure 3. Missing phenotype imputation in 140 rat GWAS**
The x-axis and y-axis show the −log10(p) for the GWAS on the un-imputed and imputed phenotypes respectively. Each point corresponds to a region in both scans. The dashed black lines denote a conservative threshold of −log10(p)>10 that was applied to highlight associated regions (large points). Points in grey have imputation r²<0.36. Associations with platelet phenotypes on chr 9 and T cell phenotypes on chr 10 are highlighted with red and blue points respectively.

**Figure 4. Platelet phenotype associations**
GWAS results for un-imputed (blue points) and imputed phenotypes (red points) for three platelet phenotypes (MPC, MPV, PDW) measured in rats, on rat chromosome 9 (50-80Mb). Genes are shown below the plots, with some (named) genes with relevant annotation to platelet function, adhesion and aggregation highlighted in a separate track. Histograms on the right show the distribution of observed (cyan) and imputed (purple) phenotypes, together with missingness and r² metrics.

**Figure 5. T cell phenotype associations**
GWAS results for un-imputed (blue points) and imputed phenotypes (red points) for three T cell phenotypes (CD25highCD4, Abs_CD25CD8, pctDP) measured in rats, on rat chromosome 10 (83-89Mb). Genes are shown below the plots, with some (named) genes with relevant annotation to T cell phenotypes highlighted in a separate track. Histograms on the right show the distribution of observed (cyan) and imputed (purple) phenotypes, together with missingness and r² metrics.

See this image and copyright information in PMC

References

1. Marx V. Human phenotyping on a population scale. Nat. Methods. 2015;12:711–714. - PubMed
1. Soranzo N, et al. A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nat. Genet. 2009;41:1182–1190. - PMC - PubMed
1. Zhou X, Stephens M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods. 2014;11:407–409. - PMC - PubMed
1. Huffman JE, et al. Polymorphisms in B3GAT1, SLC9A9 and MGAT5 are associated with variation within the human plasma N-glycome of 3533 European adults. Hum. Mol. Genet. 2011;20:5000–5011. - PubMed
1. Lauc G, et al. Genomics meets glycomics-the first GWAS study of human N-Glycome identifies HNF1α as a master regulator of plasma protein fucosylation. PLoS Genet. 2010;6:e1001256. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A multiple-phenotype imputation method for genetic studies

Affiliations

A multiple-phenotype imputation method for genetic studies

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources