Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Sep 15;26(18):i524-30.
doi: 10.1093/bioinformatics/btq378.

is-rSNP: a novel technique for in silico regulatory SNP detection

Affiliations
Comparative Study

is-rSNP: a novel technique for in silico regulatory SNP detection

Geoff Macintyre et al. Bioinformatics. .

Abstract

Motivation: Determining the functional impact of non-coding disease-associated single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) is challenging. Many of these SNPs are likely to be regulatory SNPs (rSNPs): variations which affect the ability of a transcription factor (TF) to bind to DNA. However, experimental procedures for identifying rSNPs are expensive and labour intensive. Therefore, in silico methods are required for rSNP prediction. By scoring two alleles with a TF position weight matrix (PWM), it can be determined which SNPs are likely rSNPs. However, predictions in this manner are noisy and no method exists that determines the statistical significance of a nucleotide variation on a PWM score.

Results: We have designed an algorithm for in silico rSNP detection called is-rSNP. We employ novel convolution methods to determine the complete distributions of PWM scores and ratios between allele scores, facilitating assignment of statistical significance to rSNP effects. We have tested our method on 41 experimentally verified rSNPs, correctly predicting the disrupted TF in 28 cases. We also analysed 146 disease-associated SNPs with no known functional impact in an attempt to identify candidate rSNPs. Of the 11 significantly predicted disrupted TFs, 9 had previous evidence of being associated with the disease in the literature. These results demonstrate that is-rSNP is suitable for high-throughput screening of SNPs for potential regulatory function. This is a useful and important tool in the interpretation of GWAS.

Availability: is-rSNP software is available for use at: www.genomics.csse.unimelb.edu.au/is-rSNP.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
This figure provides an example of the data used at each step of the is-rSNP algorithm applied to a fetal growth disorder rSNP. (a) Logo for OCT1. (b) Two binding sites for OCT1 containing a SNP: Allele 1, position 8 = T (top); Allele 2, position 8 = C (bottom). (c) Distribution of scores generated by OCT1 PWM and observed scores for Allele 1 (A1) and Allele 2 (A2). (d) Distribution of scores generated by OCT1 PWM and observed scores for Allele 1 and Allele 2, y-axis log scale.
Fig. 2.
Fig. 2.
This figure compares output of is-rSNP and sTRAP after analysis of 28 known rSNPs filtered for significant TF binding, i.e. having a PWM score with P ≤ 0.001 for one of the alleles for the matching TF. In addition, the output of a modified version of is-rSNP with no filtering of significant PWM binding sites and P-value thresholding is compared with sTRAP output on 41 known rSNPs. In this graph, smaller values are better. The black line represents the mean value of the top-ranked correct predictions, the box edges represent the first and third quartile and the whiskers extend to the most extreme observation. In addition, each data point is plotted beneath its respective box plot, with jitter. Note that the x-axis is log scale.

Similar articles

Cited by

References

    1. Andersen MC, et al. In silico detection of sequence variations modifying transcriptional regulation. PLoS Comput. Biol. 2008;4:e5. - PMC - PubMed
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.) 1995;57:289–300.
    1. Claverie J, Audic S. The statistical significance of nucleotide position-weight matrix matches. Bioinformatics. 1996;12:431. - PubMed
    1. Crooks GE, et al. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. - PMC - PubMed
    1. Demars JD, et al. Analysis of the IGF2/H19 imprinting control region uncovers new genetic defects, including mutations of OCT-binding sequences, in patients with 11p15 fetal growth disorders. Hum. Mol. Genet. 2010;19:803–814. - PubMed

Publication types