. 2017 Aug;25(8):988-994.

doi: 10.1038/ejhg.2017.90. Epub 2017 May 24.

A rare-variant test for high-dimensional data

Marika Kaakinen¹, Reedik Mägi², Krista Fischer², Jani Heikkinen^{1

3}, Marjo-Riitta Järvelin^{4

5

6

7}, Andrew P Morris⁸, Inga Prokopenko¹

Affiliations

¹ Department of Genomics of Common Disease, Imperial College London, London, UK.
² Estonian Genome Center, University of Tartu, Tartu, Estonia.
³ Neuroepidemiology and Ageing (NEA) Research Unit, Imperial College London, London, UK.
⁴ Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK.
⁵ Center for Life Course Health Research, University of Oulu, Oulu, Finland.
⁶ Unit of Primary Care, Oulu University Hospital, Oulu, Finland.
⁷ Biocenter Oulu, University of Oulu, Oulu, Finland.
⁸ Department of Biostatistics, University of Liverpool, Liverpool, UK.

PMID: 28537275
PMCID: PMC5513099
DOI: 10.1038/ejhg.2017.90

A rare-variant test for high-dimensional data

Marika Kaakinen et al. Eur J Hum Genet. 2017 Aug.

. 2017 Aug;25(8):988-994.

doi: 10.1038/ejhg.2017.90. Epub 2017 May 24.

Authors

Marika Kaakinen¹, Reedik Mägi², Krista Fischer², Jani Heikkinen^{1

3}, Marjo-Riitta Järvelin^{4

5

6

7}, Andrew P Morris⁸, Inga Prokopenko¹

Affiliations

¹ Department of Genomics of Common Disease, Imperial College London, London, UK.
² Estonian Genome Center, University of Tartu, Tartu, Estonia.
³ Neuroepidemiology and Ageing (NEA) Research Unit, Imperial College London, London, UK.
⁴ Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK.
⁵ Center for Life Course Health Research, University of Oulu, Oulu, Finland.
⁶ Unit of Primary Care, Oulu University Hospital, Oulu, Finland.
⁷ Biocenter Oulu, University of Oulu, Oulu, Finland.
⁸ Department of Biostatistics, University of Liverpool, Liverpool, UK.

PMID: 28537275
PMCID: PMC5513099
DOI: 10.1038/ejhg.2017.90

Abstract

Genome-wide association studies have facilitated the discovery of thousands of loci for hundreds of phenotypes. However, the issue of missing heritability remains unsolved for most complex traits. Locus discovery could be enhanced with both improved power through multi-phenotype analysis (MPA) and use of a wider allele frequency range, including rare variants (RVs). MPA methods for single-variant association have been proposed, but given their low power for RVs, more efficient approaches are required. We propose multi-phenotype analysis of rare variants (MARV), a burden test-based method for RVs extended to the joint analysis of multiple phenotypes through a powerful reverse regression technique. Specifically, MARV models the proportion of RVs at which minor alleles are carried by individuals within a genomic region as a linear combination of multiple phenotypes, which can be both binary and continuous, and the method accommodates directly the genotyped and imputed data. The full model, including all phenotypes, is tested for association for discovery, and a more thorough dissection of the phenotype combinations for any set of RVs is also enabled. We show, via simulations, that the type I error rate is well controlled under various correlations between two continuous phenotypes, and that the method outperforms a univariate burden test in all considered scenarios. Application of MARV to 4876 individuals from the Northern Finland Birth Cohort 1966 for triglycerides, high- and low-density lipoprotein cholesterols highlights known loci with stronger signals of association than those observed in univariate RV analyses and suggests novel RV effects for these lipid traits.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Comparison of MARV with previously proposed RV multi-phenotype association analysis methods. Upper blocks: Established rare-variant single-phenotype methods and common-variant multi-phenotype methods based on the individual level data. Lower block: Previously proposed RV multiple-phenotype methods^{, , , ,} versus our proposed MARV method.

**Figure 2**
Estimated type I error rate with 95 % confidence interval (CI) of the MARV method with N=5000 and varying correlation between two continuous phenotypes. The following correlations were evaluated: −0.9, −0.5, −0.3, −0.1, 0, 0.1, 0.3, 0.5, 0.9.

**Figure 3**
Statistical power of the MARV method with N=5000 and varying correlation between two continuous phenotypes. (a–d) All genetic effects are trait-increasing. (e–h) Half of the genetic effects are trait-increasing, half trait-decreasing. (a,e) Effects on both phenotypes, same direction, same magnitude. (b,f) Effects on both phenotypes, opposite direction, same magnitude. (c,g) Effects on both phenotypes, same direction, different magnitude (effect on phenotype 2 is half of that on phenotype 1). (d,h) Effects on one phenotype only. Solid, black line: MARV; dotted, magenta line: GAMuT; dashed, grey line: univariate analysis (GRANVIL). The following correlations were evaluated: −0.9, −0.5, −0.3, −0.1, 0, 0.1, 0.3, 0.5, 0.9.

**Figure 4**
Genome-wide association analysis results from MARV for triglycerides, high-density lipoprotein and low-density lipoprotein cholesterols in the NFBC1966. (a) Manhattan plot for the full model statistical significance. Genes reaching statistical significance (P<1.67 × 10⁻⁶) are annotated. (b) QQ-plot of the full model association P-values against the expected P-values. Note that at some of the loci, different gene transcripts resulted in exactly the same association result. Such results show as a horizontal line of dots in the figure. (c) Effect sizes with their 95% confidence intervals of triglycerides, high-density lipoprotein and low-density lipoprotein cholesterols plotted against their statistical significance for the loci reaching genome-wide significance. In each figure, the panel on the left shows the results from the full model, the middle panel shows them from the best model based on Bayesian Information Criterion and the right panel illustrates results from univariate models. For *APOA5*, statistically significant associations were detected for three different transcripts.

See this image and copyright information in PMC

Cited by

The Role of Next-Generation Sequencing in Pharmacogenetics and Pharmacogenomics.
Schwarz UI, Gulilat M, Kim RB. Schwarz UI, et al. Cold Spring Harb Perspect Med. 2019 Feb 1;9(2):a033027. doi: 10.1101/cshperspect.a033027. Cold Spring Harb Perspect Med. 2019. PMID: 29844222 Free PMC article. Review.
Cardioinformatics: the nexus of bioinformatics and precision cardiology.
Khomtchouk BB, Tran DT, Vand KA, Might M, Gozani O, Assimes TL. Khomtchouk BB, et al. Brief Bioinform. 2020 Dec 1;21(6):2031-2051. doi: 10.1093/bib/bbz119. Brief Bioinform. 2020. PMID: 31802103 Free PMC article.
Gene Association Analysis of Quantitative Trait Based on Functional Linear Regression Model with Local Sparse Estimator.
Wang J, Zhou F, Li C, Yin N, Liu H, Zhuang B, Huang Q, Wen Y. Wang J, et al. Genes (Basel). 2023 Mar 30;14(4):834. doi: 10.3390/genes14040834. Genes (Basel). 2023. PMID: 37107592 Free PMC article.
MARV: a tool for genome-wide multi-phenotype analysis of rare variants.
Kaakinen M, Mägi R, Fischer K, Heikkinen J, Järvelin MR, Morris AP, Prokopenko I. Kaakinen M, et al. BMC Bioinformatics. 2017 Feb 16;18(1):110. doi: 10.1186/s12859-017-1530-2. BMC Bioinformatics. 2017. PMID: 28209135 Free PMC article.
Recent advances and challenges of rare variant association analysis in the biobank sequencing era.
Chen W, Coombes BJ, Larson NB. Chen W, et al. Front Genet. 2022 Oct 6;13:1014947. doi: 10.3389/fgene.2022.1014947. eCollection 2022. Front Genet. 2022. PMID: 36276986 Free PMC article. Review.

See all "Cited by" articles

References

1. Manolio TA, Collins FS, Cox NJ et al: Finding the missing heritability of complex diseases. Nature 2009; 461: 747–753. - PMC - PubMed
1. Amos CI, Laing A: A comparison of univariate and multivariate tests for genetic linkage. Genet Epidemiol 1993; 10: 671–676. - PubMed
1. Allison DB, Thiel B St, Jean P, Elston RC, Infante MC, Schork NJ: Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. Am J Hum Genet 1998; 63: 1190–1201. - PMC - PubMed
1. Banerjee S, Yandell BS, Yi NJ: Bayesian quantitative trait loci mapping for multiple traits. Genetics 2008; 179: 2275–2289. - PMC - PubMed
1. Kim S, Xing EP: Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet 2009; 5: e1000587. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A rare-variant test for high-dimensional data

Affiliations

A rare-variant test for high-dimensional data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources