Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar 30;107(13):5732-7.
doi: 10.1073/pnas.0913635107. Epub 2010 Mar 15.

MiDReG: a method of mining developmentally regulated genes using Boolean implications

Affiliations

MiDReG: a method of mining developmentally regulated genes using Boolean implications

Debashis Sahoo et al. Proc Natl Acad Sci U S A. .

Abstract

We present a method termed mining developmentally regulated genes (MiDReG) to predict genes whose expression is either activated or repressed as precursor cells differentiate. MiDReG does not require gene expression data from intermediate stages of development. MiDReG is based on the gene expression patterns between the initial and terminal stages of the differentiation pathway, coupled with "if-then" rules (Boolean implications) mined from large-scale microarray databases. MiDReG uses two gene expression-based seed conditions that mark the initial and the terminal stages of a given differentiation pathway and combines the statistically inferred Boolean implications from these seed conditions to identify the relevant genes. The method was validated by applying it to B-cell development. The algorithm predicted 62 genes that are expressed after the KIT+ progenitor cell stage and remain expressed through CD19+ and AICDA+ germinal center B cells. qRT-PCR of 14 of these genes on sorted B-cell progenitors confirmed that the expression of 10 genes is indeed stably established during B-cell differentiation. Review of the published literature of knockout mice revealed that of the predicted genes, 63.4% have defects in B-cell differentiation and function and 22% have a role in the B cell according to other experiments, and the remaining 14.6% are not characterized. Therefore, our method identified novel gene candidates for future examination of their role in B-cell development. These data demonstrate the power of MiDReG in predicting functionally important intermediate genes in a given developmental pathway that is defined by a mutually exclusive gene expression pattern.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Computational prediction of developmental genes using Boolean implications. (A) BooleanNet algorithm on 4,787 Affymetrix U133 Plus 2.0 human microarrays and 2,167 Affymetrix 430 2.0 mouse arrays that were downloaded from NCBI’s Gene Expression Omnibus. (B) The scatter plots show six different types of Boolean implications between X and Y in human datasets. (C) The pie charts show the percentage of probesets with the indicated number of Boolean implications (0, < 1,000, < 10,000, and ≥10,000) in human and mouse datasets. More than 60% of the probesets have greater than 1,000 Boolean implications. (D) The Venn diagram shows the number of Boolean implications that are conserved across humans and mice. The mouse homologs were identified by using the euGene database: 15,199 human probesets and 10,695 mouse probesets have corresponding homologs. There are 4 M conserved Boolean implications out of 22 M in the human dataset. A conserved Boolean implication, KIT high⇒CD19 low is shown on the right. (E) MiDReG algorithm. It uses two seed genes: A, which is expressed early in development, and B, which is expressed later in the development, and identifies gene X by using Boolean implications, which is hypothesized to be expressed earlier than gene B and its expression is maintained throughout further development.
Fig. 2.
Fig. 2.
Validation of B-cell precursor genes based on KIT and CD19. (A) B-cell precursor genes were predicted by using KIT and CD19 as seed genes. KIT is expressed early in the development, and CD19 is expressed in the mature B cell. The Boolean implication KIT high⇒CD19 low indeed reflects this situation. The identified genes turning on between KIT and CD19 are genes X such that KIT high⇒X low and CD19 high⇒X high. The list of genes is filtered by intersecting results from both human and mouse datasets. (B) The MiDReG algorithm identified 19 B-cell precursor genes by using KIT and CD19. Quantitative RT-PCR (qRT-PCR) was performed on 13 purified hematopoietic populations at different stages of B-cell differentiation: HSC, MPPFL- (multipotent progenitors Flk2-), MPPFL+ (multipotent progenitors Flk2+), CLP (common lymphoid progenitors), Frac A (Pre-Pro-B), Frac B (Pro-B), Frac C (large pre-B), Frac D (small pre-B), Frac E (immature B), T1 (Transitional 1), T2 (Transitional 2), mature B, and GC (germinal center B cells). The bar plot shows relative gene expressions from the qPCR result of 16 genes including the seed genes: KIT and CD19. The gene expressions are displayed as a percentage to the maximum gene expression level. The expression level of KIT is high, and none of the CD19 transcripts are detected from HSC to MPPFL+ stages. The expression level of CD19 is high, and none of the KIT transcripts are detected from FracD to GC stages. Therefore, for each of the 14 experimental genes the median expression level from HSC to MPPFL- stages is compared against the median expression level from FracD to GC stages. The results show that 10 out of 14 genes (indicated with *) have higher median expression levels from FracD to GC stages compared to the HSC and MPPFL- stages (FDR = 14.7%). These genes have low expression or turn off at HSC to MPPFL-; then they turn on between MPPFL+ to Frac C and are highly expressed in FracD to GC stages. The bottom four genes (indicated with †) did not pass the above test.
Fig. 3.
Fig. 3.
Validation of B-cell precursor genes based on KIT, AICDA, and CD19. (A) B-cell precursor genes were predicted by using KIT as the first seed gene and a combination of CD19 and AICDA as the second seed gene. The list of genes is filtered by using conservation across both human and mouse datasets. The combination of CD19 and AICDA expression levels are specific to a narrow region in the later stages of B-cell development, so the MiDReG algorithm is expected to return more genes than the earlier results using CD19 only. The MiDReG algorithm predicted 52 B-cell precursor genes by using KIT, CD19, and AICDA. These genes are hypothesized to be expressed after the c-kit+ progenitor cell stage and remain expressed through CD19+AICDA+ GC B cells. (B) qRT-PCR results for Pax5, Syk, Il21r, Spi-B, and Fcrlm1 are shown. The results show that all five genes indicated with * have higher median expression levels from FracD to GC stages compared to the HSC and MPPFL- stages, which suggests that the expression patterns for these genes are indeed stably maintained through GC B cells.
Fig. 4.
Fig. 4.
Classification of the predicted B-cell genes. (A) Predicted B-cell genes are grouped according to reported B-cell functions in the literature. Out of 62 genes, 35 (56.5%) genes are associated with known B-cell function, 5 (8.1%) genes are indirectly related to the B cell through interacting proteins, 3 (4.8%) genes are unknown, 8 (12.9%) genes have other roles, and 11 (17.7%) genes could have a B-cell function based on their expression in the B cell and reported other hematopoietic functions. (B) Predicted B-cell genes with available mice knockouts are grouped according to reported B-cell phenotypes in the literature. Out of 62 genes, 41 genes have been knocked out in mice. Out of these 41 mice knockouts, 26 (63.4%) genes show defects in B-cell function and differentiation, 9 (22.0%) genes are associated with known B-cell function according to other experiments, and 6 (14.6%) genes could have a B-cell function based on their expression in the B cell and reported other hematopoietic functions. (C) Predicted B-cell genes grouped according to gene ontology classification. Out of 62 genes, 26 (41.9%) genes are cell surface receptors, 15 (24.2%) genes are associated with signal transduction, 10 (16.1%) genes are transcription factors, 9 (14.5%) genes are associated with other metabolic process, 1 (1.6%) unknown gene, and 1 (1.6%) cytokine.

Similar articles

Cited by

References

    1. Lee KH, Yu DH, Lee YS. Gene expression profiling of rat cerebral cortex development using cDNA microarrays. Neurochem Res. 2008;34:1030–1038. - PubMed
    1. Jochheim A, et al. Multi-stage analysis of differential gene expression in BALB/C mouse liver development by high-density microarrays. Differentiation. 2003;71:62–72. - PubMed
    1. Master SR, et al. Functional microarray analysis of mammary organogenesis reveals a developmental role in adaptive thermogenesis. Mol Endocrinol. 2002;16:1185–1203. - PubMed
    1. Forsberg EC, et al. Differential expression of novel potential regulators in hematopoietic stem cells. PLoS Genet. 2005;1:e28. - PMC - PubMed
    1. Sahoo D, Dill DL, Gentles AJ, Tibshirani R, Plevritis SK. Boolean implication networks derived from large scale, whole genome microarray datasets. Genome Biol. 2008;9:R157. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources