Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr;32(4):170-85.
doi: 10.5732/cjc.012.10113. Epub 2013 Jan 18.

Identification of SNP-containing regulatory motifs in the myelodysplastic syndromes model using SNP arrays and gene expression arrays

Affiliations

Identification of SNP-containing regulatory motifs in the myelodysplastic syndromes model using SNP arrays and gene expression arrays

Jing Fan et al. Chin J Cancer. 2013 Apr.

Abstract

Myelodysplastic syndromes have increased in frequency and incidence in the American population, but patient prognosis has not significantly improved over the last decade. Such improvements could be realized if biomarkers for accurate diagnosis and prognostic stratification were successfully identified. In this study, we propose a method that associates two state-of-the-art array technologies--single nucleotide polymor-phism(SNP) array and gene expression array--with gene motifs considered transcription factor-binding sites (TFBS). We are particularly interested in SNP-containing motifs introduced by genetic variation and mutation as TFBS. The potential regulation of SNP-containing motifs affects only when certain mutations occur. These motifs can be identified from a group of co-expressed genes with copy number variation. Then, we used a sliding window to identify motif candidates near SNPs on gene sequences. The candidates were filtered by coarse thresholding and fine statistical testing. Using the regression-based LARS-EN algorithm and a level-wise sequence combination procedure, we identified 28 SNP-containing motifs as candidate TFBS. We confirmed 21 of the 28 motifs with ChIP-chip fragments in the TRANSFAC database. Another six motifs were validated by TRANSFAC via searching binding fragments on co-regulated genes. The identified motifs and their location genes can be considered potential biomarkers for myelodysplastic syndromes. Thus, our proposed method, a novel strategy for associating two data categories, is capable of integrating information from different sources to identify reliable candidate regulatory SNP-containing motifs introduced by genetic variation and mutation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Illustration of twin expanded gene regions (EGRs).
Here, single nucleotide polymorphisms (SNPs) introduce potential novel motifs. For one gene, the potential motif (red box) has exactly the same DNA sequence except the heterozygous SNP position (G and A) in gene A. The motif will be missed if the heterozygous SNP position is G.
Figure 2.
Figure 2.. Example of how sliding window selects candidate motifs with length L = 9 from a sequence fragment.
The SNP is noted in red. A total of 8 candidate motifs are generated while the sliding window cruises from left to right.
Figure 3.
Figure 3.. Flowchart of the SNP-containing motif frequency matrix calculation.
Details are provided in the Method section.
Figure 4.
Figure 4.. Visualization of the expanded gene region (EGR) of a gene and its related SNP.
The box plots describe copy numbers of each SNP across 7 disease samples.
Figure 5.
Figure 5.. Box plot of distributions in the upstream gene and downstream regions of motif GTGCCAC across all samples.
The index of MDS samples are marked in red. S stands for samples.
Figure 6.
Figure 6.. Flowchart of verification of motifs that could not be directly confirmed by TRANSFAC.

Similar articles

Cited by

References

    1. Gondek LP, Dunbar AJ, Szpurka H, et al. et al. SNP array karyotyping allows for the detection of uniparental disomy and cryptic chromosomal abnormalities in MDS/MPD-U and MPD. PLoS ONE. 2007;2:e1225. - PMC - PubMed
    1. Gondek LP, Tiu R, O'Keefe CL, et al. et al. Chromosomal lesions and uniparental disomy detected by SNP arrays in MDS, MDS/MPD, and MDS-derived AML. Blood. 2008;111:1534–1542. - PMC - PubMed
    1. Chen G, Zeng W, Miyazato A, et al. et al. Distinctive gene expression profiles of CD34 cells from patients with myelodysplastic syndrome characterized by specific chromo-somal abnormalities. Blood. 2004;104:4210–4218. - PubMed
    1. Pellagatti A, Esoof N, Watkins F, et al. et al. Gene expression profiling in the myelodysplastic syndromes using cDNA microarray technology. Br J Haematol. 2004;125:576–583. - PubMed
    1. Lastowska M, Viprey V, Santibanez-Koref M, et al. et al. Identification of candidate genes involved in neuroblastoma progression by combining genomic and expression microarrays with survival data. Oncogene. 2007;26:7432–7444. - PubMed

Publication types

Substances