Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17
- PMID: 22373385
- PMCID: PMC3287844
- DOI: 10.1186/1753-6561-5-S9-S12
Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17
Abstract
The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.
Figures


Similar articles
-
Application of collapsing methods for continuous traits to the Genetic Analysis Workshop 17 exome sequence data.BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S121. doi: 10.1186/1753-6561-5-S9-S121. BMC Proc. 2011. PMID: 22373425 Free PMC article.
-
Comparison of three statistical approaches for feature selection for fine-scale genetic population assignment in four pig breeds.Trop Anim Health Prod. 2021 Jul 10;53(3):395. doi: 10.1007/s11250-021-02824-x. Trop Anim Health Prod. 2021. PMID: 34245361
-
Penalized-regression-based multimarker genotype analysis of Genetic Analysis Workshop 17 data.BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S92. doi: 10.1186/1753-6561-5-S9-S92. BMC Proc. 2011. PMID: 22373158 Free PMC article.
-
Mining gold dust under the genome wide significance level: a two-stage approach to analysis of GWAS.Genet Epidemiol. 2011 Feb;35(2):111-8. doi: 10.1002/gepi.20556. Epub 2010 Dec 31. Genet Epidemiol. 2011. PMID: 21254218 Free PMC article.
-
Detecting disease rare alleles using single SNPs in families and haplotyping in unrelated subjects from the Genetic Analysis Workshop 17 data.BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S96. doi: 10.1186/1753-6561-5-S9-S96. BMC Proc. 2011. PMID: 22373254 Free PMC article.
Cited by
-
Prediction of anticancer drug sensitivity using an interpretable model guided by deep learning.BMC Bioinformatics. 2024 May 9;25(1):182. doi: 10.1186/s12859-024-05669-x. BMC Bioinformatics. 2024. PMID: 38724920 Free PMC article.
-
Genome-wide association study of lung function phenotypes in a founder population.J Allergy Clin Immunol. 2014 Jan;133(1):248-55.e1-10. doi: 10.1016/j.jaci.2013.06.018. Epub 2013 Aug 6. J Allergy Clin Immunol. 2014. PMID: 23932459 Free PMC article.
-
Quality control issues and the identification of rare functional variants with next-generation sequencing data.Genet Epidemiol. 2011;35 Suppl 1(Suppl 1):S22-8. doi: 10.1002/gepi.20645. Genet Epidemiol. 2011. PMID: 22128054 Free PMC article.
-
Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder.Front Oncol. 2023 Jan 10;12:1091767. doi: 10.3389/fonc.2022.1091767. eCollection 2022. Front Oncol. 2023. PMID: 36703783 Free PMC article.
References
LinkOut - more resources
Full Text Sources