Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug;25(8):988-994.
doi: 10.1038/ejhg.2017.90. Epub 2017 May 24.

A rare-variant test for high-dimensional data

Affiliations

A rare-variant test for high-dimensional data

Marika Kaakinen et al. Eur J Hum Genet. 2017 Aug.

Abstract

Genome-wide association studies have facilitated the discovery of thousands of loci for hundreds of phenotypes. However, the issue of missing heritability remains unsolved for most complex traits. Locus discovery could be enhanced with both improved power through multi-phenotype analysis (MPA) and use of a wider allele frequency range, including rare variants (RVs). MPA methods for single-variant association have been proposed, but given their low power for RVs, more efficient approaches are required. We propose multi-phenotype analysis of rare variants (MARV), a burden test-based method for RVs extended to the joint analysis of multiple phenotypes through a powerful reverse regression technique. Specifically, MARV models the proportion of RVs at which minor alleles are carried by individuals within a genomic region as a linear combination of multiple phenotypes, which can be both binary and continuous, and the method accommodates directly the genotyped and imputed data. The full model, including all phenotypes, is tested for association for discovery, and a more thorough dissection of the phenotype combinations for any set of RVs is also enabled. We show, via simulations, that the type I error rate is well controlled under various correlations between two continuous phenotypes, and that the method outperforms a univariate burden test in all considered scenarios. Application of MARV to 4876 individuals from the Northern Finland Birth Cohort 1966 for triglycerides, high- and low-density lipoprotein cholesterols highlights known loci with stronger signals of association than those observed in univariate RV analyses and suggests novel RV effects for these lipid traits.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Comparison of MARV with previously proposed RV multi-phenotype association analysis methods. Upper blocks: Established rare-variant single-phenotype methods and common-variant multi-phenotype methods based on the individual level data. Lower block: Previously proposed RV multiple-phenotype methods, , , , versus our proposed MARV method.
Figure 2
Figure 2
Estimated type I error rate with 95 % confidence interval (CI) of the MARV method with N=5000 and varying correlation between two continuous phenotypes. The following correlations were evaluated: −0.9, −0.5, −0.3, −0.1, 0, 0.1, 0.3, 0.5, 0.9.
Figure 3
Figure 3
Statistical power of the MARV method with N=5000 and varying correlation between two continuous phenotypes. (ad) All genetic effects are trait-increasing. (eh) Half of the genetic effects are trait-increasing, half trait-decreasing. (a,e) Effects on both phenotypes, same direction, same magnitude. (b,f) Effects on both phenotypes, opposite direction, same magnitude. (c,g) Effects on both phenotypes, same direction, different magnitude (effect on phenotype 2 is half of that on phenotype 1). (d,h) Effects on one phenotype only. Solid, black line: MARV; dotted, magenta line: GAMuT; dashed, grey line: univariate analysis (GRANVIL). The following correlations were evaluated: −0.9, −0.5, −0.3, −0.1, 0, 0.1, 0.3, 0.5, 0.9.
Figure 4
Figure 4
Genome-wide association analysis results from MARV for triglycerides, high-density lipoprotein and low-density lipoprotein cholesterols in the NFBC1966. (a) Manhattan plot for the full model statistical significance. Genes reaching statistical significance (P<1.67 × 10−6) are annotated. (b) QQ-plot of the full model association P-values against the expected P-values. Note that at some of the loci, different gene transcripts resulted in exactly the same association result. Such results show as a horizontal line of dots in the figure. (c) Effect sizes with their 95% confidence intervals of triglycerides, high-density lipoprotein and low-density lipoprotein cholesterols plotted against their statistical significance for the loci reaching genome-wide significance. In each figure, the panel on the left shows the results from the full model, the middle panel shows them from the best model based on Bayesian Information Criterion and the right panel illustrates results from univariate models. For APOA5, statistically significant associations were detected for three different transcripts.

Similar articles

Cited by

References

    1. Manolio TA, Collins FS, Cox NJ et al: Finding the missing heritability of complex diseases. Nature 2009; 461: 747–753. - PMC - PubMed
    1. Amos CI, Laing A: A comparison of univariate and multivariate tests for genetic linkage. Genet Epidemiol 1993; 10: 671–676. - PubMed
    1. Allison DB, Thiel B St, Jean P, Elston RC, Infante MC, Schork NJ: Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages. Am J Hum Genet 1998; 63: 1190–1201. - PMC - PubMed
    1. Banerjee S, Yandell BS, Yi NJ: Bayesian quantitative trait loci mapping for multiple traits. Genetics 2008; 179: 2275–2289. - PMC - PubMed
    1. Kim S, Xing EP: Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet 2009; 5: e1000587. - PMC - PubMed

Publication types