Statistical significance for genomewide studies
- PMID: 12883005
- PMCID: PMC170937
- DOI: 10.1073/pnas.1530509100
Statistical significance for genomewide studies
Abstract
With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.
Figures



Similar articles
-
The false discovery rate: a key concept in large-scale genetic studies.Cancer Control. 2010 Jan;17(1):58-62. doi: 10.1177/107327481001700108. Cancer Control. 2010. PMID: 20010520
-
Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies.Am J Hum Genet. 2004 Sep;75(3):424-35. doi: 10.1086/423738. Epub 2004 Jul 19. Am J Hum Genet. 2004. PMID: 15266393 Free PMC article.
-
Rank order metrics for quantifying the association of sequence features with gene regulation.Bioinformatics. 2003 Jan 22;19(2):212-8. doi: 10.1093/bioinformatics/19.2.212. Bioinformatics. 2003. PMID: 12538241
-
Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments.Trends Genet. 2005 Feb;21(2):93-102. doi: 10.1016/j.tig.2004.12.009. Trends Genet. 2005. PMID: 15661355 Review.
-
Bioinformatics analysis of alternative splicing.Brief Bioinform. 2005 Mar;6(1):23-33. doi: 10.1093/bib/6.1.23. Brief Bioinform. 2005. PMID: 15826354 Review.
Cited by
-
Resistant Potato Starch Supplementation Reduces Serum Free Fatty Acid Levels and Influences Bile Acid Metabolism.Metabolites. 2024 Oct 5;14(10):536. doi: 10.3390/metabo14100536. Metabolites. 2024. PMID: 39452917 Free PMC article.
-
Integrative Informatics Analysis of Transcriptome and Identification of Interacted Genes in the Glomeruli and Tubules in CKD.Front Med (Lausanne). 2021 Feb 12;7:615306. doi: 10.3389/fmed.2020.615306. eCollection 2020. Front Med (Lausanne). 2021. PMID: 33644086 Free PMC article.
-
Analyses of expressed sequence tags in Neurospora reveal rapid evolution of genes associated with the early stages of sexual reproduction in fungi.BMC Evol Biol. 2012 Nov 27;12:229. doi: 10.1186/1471-2148-12-229. BMC Evol Biol. 2012. PMID: 23186325 Free PMC article.
-
Urine biomarkers can predict prostate cancer and PI-RADS score prior to biopsy.Sci Rep. 2024 Aug 5;14(1):18148. doi: 10.1038/s41598-024-68026-1. Sci Rep. 2024. PMID: 39103428 Free PMC article.
-
Tissue-specific effects of genetic and epigenetic variation on gene regulation and splicing.PLoS Genet. 2015 Jan 29;11(1):e1004958. doi: 10.1371/journal.pgen.1004958. eCollection 2015 Jan. PLoS Genet. 2015. PMID: 25634236 Free PMC article.
References
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical