Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr 13;23(2):574-590.
doi: 10.1093/biostatistics/kxaa043.

Integrative functional linear model for genome-wide association studies with multiple traits

Affiliations

Integrative functional linear model for genome-wide association studies with multiple traits

Yang Li et al. Biostatistics. .

Abstract

In recent biomedical research, genome-wide association studies (GWAS) have demonstrated great success in investigating the genetic architecture of human diseases. For many complex diseases, multiple correlated traits have been collected. However, most of the existing GWAS are still limited because they analyze each trait separately without considering their correlations and suffer from a lack of sufficient information. Moreover, the high dimensionality of single nucleotide polymorphism (SNP) data still poses tremendous challenges to statistical methods, in both theoretical and practical aspects. In this article, we innovatively propose an integrative functional linear model for GWAS with multiple traits. This study is the first to approximate SNPs as functional objects in a joint model of multiple traits with penalization techniques. It effectively accommodates the high dimensionality of SNPs and correlations among multiple traits to facilitate information borrowing. Our extensive simulation studies demonstrate the satisfactory performance of the proposed method in the identification and estimation of disease-associated genetic variants, compared to four alternatives. The analysis of type 2 diabetes data leads to biologically meaningful findings with good prediction accuracy and selection stability.

Keywords: Functional data analysis; Genome-wide association studies; Joint analysis of multiple traits; Penalization.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
A simple example: true coefficient signals (grey solid line), estimated signals using the regression model with the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li, 2001) (blue points), and estimated signals using the functional data analysis method fSCAD developed in Lin and others (2017) (orange dashed line). Upper right small plot: zoomed-in version with y-axis from formula image0.01 to 0.01.
Fig. 2.
Fig. 2.
Simulation: true coefficient functions under scenario of Case I (20% signal region). Blue solid line: formula image; Pink dashed line: formula image. Left: formula image and formula image; Middle: formula image and formula image; Right: formula image and formula image.
Fig. 3.
Fig. 3.
Simulation: true and estimated coefficient functions under scenario of Case I with formula image, correlation 0.9 and formula image. Blue solid line (thin): formula image; Blue solid line (thick): formula image; Orange dashed line (thin): formula image; Orange dashed line (thick): formula image.
Fig. 4.
Fig. 4.
Data analysis: estimated coefficient functions. Blue solid line: formula image; Orange dashed line: formula image.

Similar articles

Cited by

References

    1. Chai, H., Shi, X., Zhang, Q., Zhao, Q., Huang, Y. and Ma, S. (2017). Analysis of cancer gene expression data with an assisted robust marker identification approach. Genetic Epidemiology 41, 779–789. - PMC - PubMed
    1. Chiu, C., Zhang, B., Wang, S., Shao, J. Lakhal-Chaieb, M.L., Cook, R.J., Wilson, A.F., Bailey-Wilson J.E., Xiong, M. and Fan, R. (2019). Gene-based association analysis of survival traits via functional regression-based mixed effect Cox models for related samples. Genetic Epidemiology 43, 952–965. - PMC - PubMed
    1. Cornelis, M., Agrawal, A., Cole, J., Hansel, N.Barnes K.C., Beaty, T.H., Bennett, S.N., Bierut. L.J., Boerwinkle, E., Doheny, K.F. and others. (2010). The Gene, Environment Association Studies Consortium (Geneva): maximizing the knowledge obtained from GWAS by collaboration across studies of multiple conditions. Genetic Epidemiology 34, 364–372. - PMC - PubMed
    1. Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360.
    1. Fan, R., Wang, Y., Mills, J. L., Wilson, A. F., Bailey-Wilson, J. E. and Xiong, M. (2013). Functional linear models for association analysis of quantitative traits. Genetic Epidemiology 37, 726–742. - PMC - PubMed

Publication types