Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008;9(12):R170.
doi: 10.1186/gb-2008-9-12-r170. Epub 2008 Dec 5.

FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease

Affiliations

FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease

Rong Chen et al. Genome Biol. 2008.

Abstract

Background: Candidate single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWASs) were often selected for validation based on their functional annotation, which was inadequate and biased. We propose to use the more than 200,000 microarray studies in the Gene Expression Omnibus to systematically prioritize candidate SNPs from GWASs.

Results: We analyzed all human microarray studies from the Gene Expression Omnibus, and calculated the observed frequency of differential expression, which we called differential expression ratio, for every human gene. Analysis conducted in a comprehensive list of curated disease genes revealed a positive association between differential expression ratio values and the likelihood of harboring disease-associated variants. By considering highly differentially expressed genes, we were able to rediscover disease genes with 79% specificity and 37% sensitivity. We successfully distinguished true disease genes from false positives in multiple GWASs for multiple diseases. We then derived a list of functionally interpolating SNPs (fitSNPs) to analyze the top seven loci of Wellcome Trust Case Control Consortium type 1 diabetes mellitus GWASs, rediscovered all type 1 diabetes mellitus genes, and predicted a novel gene (KIAA1109) for an unexplained locus 4q27. We suggest that fitSNPs would work equally well for both Mendelian and complex diseases (being more effective for cancer) and proposed candidate genes to sequence for their association with 597 syndromes with unknown molecular basis.

Conclusions: Our study demonstrates that highly differentially expressed genes are more likely to harbor disease-associated DNA variants. FitSNPs can serve as an effective tool to systematically prioritize candidate SNPs from GWASs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Use of differentially and constantly expressed genes to rediscover disease genes. The DER was calculated as the count of GEO datasets in which a gene was differentially expressed divided by the count of GEO datasets in which it was measured. For any cutoff x, differentially expressed genes were defined as genes with DER > x, whereas constantly expressed genes were defined as genes with DER <x. The precision/recall graphs show that the likelihood of harboring disease mutations for a gene increases when its DER value increases. For the control, we shuffled disease labels 10,000 times among all genes and obtained a predicted precision of 16%. DER, differential expression ratio; GEO, Gene Expression Omnibus.
Figure 2
Figure 2
Performance of rediscovering disease genes by DER. Genes with DER ≥ 0.55 were predicted to be disease genes, and compared with genes with disease-associated DNA variants listed in GAD and HGMD. P values were calculated using Fisher's exact test. DER, differential expression ratio; GAD, Genetic Association Database; GEO, Gene Expression Omnibus; HGMD, Human Gene Mutation Database.
Figure 3
Figure 3
Distinguishing T1DM genes from false positives in the top seven loci from GWASs using DER. Genes in the top seven loci from the WTCCC T1DM GWASs are reported with validation results. False-positive genes were shown as positive in the initial scan but found to be unassociated with T1DM in the follow-up validation studies. T1DM genes had significantly higher DER values than did false positive genes (P = 0.003). The mean DER values for T1DM and false-positive genes were 0.59 and 0.50, respectively. DER, differential expression ratio; GWAS, genome-wide association study; T1DM, type 1 diabetes mellitus; WTCCC, Wellcome Trust Case Control Consortium.
Figure 4
Figure 4
Interpreting T1DM GWAS findings at 4q27 using fitSNPs. The region 4q27 has been identified as a risk factor area for T1DM, celiac disease, and rheumatoid arthritis. IL2, IL21, and TENR were selected based on prior knowledge for sequencing in the follow-up studies, but no association was found. KIAA1109 has a much higher fitSNPs DER value than all other genes in the region, and is flanked by two significant T1DM GWAS SNPs (-log10P >5). We predicted that this gene may explain the T1DM association in this region. The GWAS -log10P curve for KIAA1109 is missing because it was not listed in the Affymetrix 500 K SNP array used for the GWAS. DER, differential expression ratio; fitSNPs, functionally interpolating single nucleotide polymorphisms; GWAS, genome-wide association study; SNP, single nucleotide polymorphism; T1DM, type 1 diabetes mellitus.
Figure 5
Figure 5
Prediction that OBSL1 is associated with systemic lupus erythematosus with nephritis through 2q34-q35. Systemic lupus erythemetosus with nephritis (SLEN2; OMIM %607966) was identified to be associated with 2q34-q35 but without identification of specific genes. OBSL1 has a much higher DER value (0.71) than those of all other genes from 2q34-q35. It was also found to be differentially expressed in juvenile idiopathic arthritis, kidney cancer, and kidney transplant rejection. Therefore, we suggest that it should be sequenced for its potential association with SLEN2.

Similar articles

Cited by

References

    1. Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J, Carlson S, Helgason A, Walters GB, Gunnarsdottir S, Mouy M, Steinthorsdottir V, Eiriksdottir GH, Bjornsdottir G, Reynisdottir I, Gudbjartsson D, Helgadottir A, Jonasdottir A, Jonasdottir A, Styrkarsdottir U, Gretarsdottir S, Magnusson KP, Stefansson H, Fossdal R, Kristjansson K, Gislason HG, Stefansson T, Leifsson BG, Thorsteinsdottir U, Lamb JR, et al. Genetics of gene expression and its effect on disease. Nature. 2008;452:423–428. doi: 10.1038/nature06758. - DOI - PubMed
    1. Keller MP, Choi Y, Wang P, Davis DB, Rabaglia ME, Oler AT, Stapleton DS, Argmann C, Schueler KL, Edwards S, Steinberg HA, Neto EC, Kleinhanz R, Turner S, Hellerstein MK, Schadt EE, Yandell BS, Kendziorski C, Attie AD. A gene expression network model of type 2 diabetes links cell cycle regulation in islets with diabetes susceptibility. Genome Res. 2008;18:706–716. doi: 10.1101/gr.074914.107. - DOI - PMC - PubMed
    1. Wang SS, Schadt EE, Wang H, Wang X, Ingram-Drake L, Shi W, Drake TA, Lusis AJ. Identification of pathways for atherosclerosis in mice: integration of quantitative trait locus analysis and global gene expression data. Circ Res. 2007;101:e11–e30. doi: 10.1161/CIRCRESAHA.107.152975. - DOI - PubMed
    1. Meng H, Vera I, Che N, Wang X, Wang SS, Ingram-Drake L, Schadt EE, Drake TA, Lusis AJ. Identification of Abcc6 as the major causal gene for dystrophic cardiac calcification in mice through integrative genomics. Proc Natl Acad Sci USA. 2007;104:4530–4535. doi: 10.1073/pnas.0607620104. - DOI - PMC - PubMed
    1. Chen Y, Zhu J, Lum PY, Yang X, Pinto S, Macneil DJ, Zhang C, Lamb J, Edwards S, Sieberts SK, Leonardson A, Castellini LW, Wang S, Champy MF, Zhang B, Emilsson V, Doss S, Ghazalpour A, Horvath S, Drake TA, Lusis AJ, Schadt EE. Variations in DNA elucidate molecular networks that cause disease. Nature. 2008;452:429–435. doi: 10.1038/nature06757. - DOI - PMC - PubMed

Publication types