Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 24;22(1):86.
doi: 10.1186/s12859-021-03968-1.

Penalized partial least squares for pleiotropy

Affiliations

Penalized partial least squares for pleiotropy

Camilo Broc et al. BMC Bioinformatics. .

Abstract

Background: The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level.

Results: Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers.

Conclusion: The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.

Keywords: Genetic epidemiology; High dimensional data; Lasso Penalization; Meta-analysis; Oncology; Partial Least Square; Pathway analysis; Pleiotropy; Sparse methods; Variable selection.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Illustration of data structured by groups of variables and sets of observations. Variables and observations are assumed to be ordered by resp. groups of variables and observations sets. The notation p represents the number of variables of matrix X, q the number of variables of matrix Y, n is the number of observations. n1,,nM are the resp. number of observations of each observation set. p1,,pK are the resp. number of variables in each group of variables
Fig. 2
Fig. 2
Mean and variance of the error of prediction in cross-validation of sgPLS, for one simulation of case 1 of the simulations. The cross-validation is performed for α{0.1,0.5,0.9} and for levels of group selection corresponding to {1,,25}
Fig. 3
Fig. 3
Mean and variance of the error of prediction in cross-validation of joint-sgPLS, for one simulation of case 1 of the simulations. The cross-validation is performed for α{0.1,0.5,0.9} and for levels of group selection corresponding to {1,,25}
Fig. 4
Fig. 4
Score for association of SNPs with the outcome for univariate model. The score is computed as -log10(p) where p is the p-value. The red line corresponds to the threshold 0.01. The alternation of blue colors shows the different chromosomes
Fig. 5
Fig. 5
Percent of selection of genes for sgPLS and joints sgPLS on 100 bootstraps. a sgPLS on thyroid data. b sgPLS on breast data. c sgPLS on both data. d joint-sgPLS. Genes selected on original data (preselected ones) are in blue while other genes (non-preselected ones) are in red
Fig. 5
Fig. 5
Percent of selection of genes for sgPLS and joints sgPLS on 100 bootstraps. a sgPLS on thyroid data. b sgPLS on breast data. c sgPLS on both data. d joint-sgPLS. Genes selected on original data (preselected ones) are in blue while other genes (non-preselected ones) are in red
Fig. 6
Fig. 6
Percent of selection of pathways for sgPLS and joints sgPLS on 100 bootstraps. a sgPLS on thyroid data. b sgPLS on breast data. c sgPLS on both data. d joint-sgPLS. Pathways selected on original data (preselected ones) are in blue while other pathways (non-preselected ones) are in red. The pathways are noted: (1) Cell cycle (2) Circadian rhythm (3) Folate metabolism (4) Other glycan degradation (5) Obesity and obesity-related phenotypes (6) DNA repair (7) Metabolism of xenobiotics (9) Precocious or delayed puberty (10) Inflammatory response

Similar articles

References

    1. Paaby AB, Rockman MV. The many faces of pleiotropy. Trends in Genetics. 2013;29(2):66–73. doi: 10.1016/j.tig.2012.10.010. - DOI - PMC - PubMed
    1. Gratten J, Visscher PM. Genetic pleiotropy in complex traits and diseases: implications for genomic medicine. Genome medicine. 2016;8(1):78. doi: 10.1186/s13073-016-0332-x. - DOI - PMC - PubMed
    1. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics. 2013;14(7):483. doi: 10.1038/nrg3461. - DOI - PMC - PubMed
    1. Yang C, Li C, Wang Q, Chung D, Zhao H. Implications of pleiotropy: challenges and opportunities for mining big data in biomedicine. Frontiers in genetics. 2015;6:229. - PMC - PubMed
    1. Gagnon-Bartsch JA, Speed TP. Using control genes to correct for unwanted variation in microarray data. Biostatistics. 2012;13(3):539–552. doi: 10.1093/biostatistics/kxr034. - DOI - PMC - PubMed

LinkOut - more resources