Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May 29;10(5):e1003578.
doi: 10.1371/journal.pcbi.1003578. eCollection 2014 May.

Finding novel molecular connections between developmental processes and disease

Affiliations

Finding novel molecular connections between developmental processes and disease

Jisoo Park et al. PLoS Comput Biol. .

Abstract

Identifying molecular connections between developmental processes and disease can lead to new hypotheses about health risks at all stages of life. Here we introduce a new approach to identifying significant connections between gene sets and disease genes, and apply it to several gene sets related to human development. To overcome the limits of incomplete and imperfect information linking genes to disease, we pool genes within disease subtrees in the MeSH taxonomy, and we demonstrate that such pooling improves the power and accuracy of our approach. Significance is assessed through permutation. We created a web-based visualization tool to facilitate multi-scale exploration of this large collection of significant connections (http://gda.cs.tufts.edu/development). High-level analysis of the results reveals expected connections between tissue-specific developmental processes and diseases linked to those tissues, and widespread connections to developmental disorders and cancers. Yet interesting new hypotheses may be derived from examining the unexpected connections. We highlight and discuss the implications of three such connections, linking dementia with bone development, polycystic ovary syndrome with cardiovascular development, and retinopathy of prematurity with lung development. Our results provide additional evidence that TGFB lays a key role in the early pathogenesis of polycystic ovary syndrome. Our evidence also suggests that the VEGF pathway and downstream NFKB signaling may explain the complex relationship between bronchopulmonary dysplasia and retinopathy of prematurity, and may form a bridge between two currently-competing hypotheses about the molecular origins of bronchopulmonary dysplasia. Further data exploration and similar queries about other gene sets may generate a variety of new information about the molecular relationships between additional diseases.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Pooling genes across related diseases to assess enrichment.
a) Lung development genes linked directly to three related MeSH terms. The genes associated with each term are shown in a different color. b) By pooling the lung development genes from the subtree rooted at the Neural tube defects node, we obtain enough genes to identify significant enrichment at that node. Colors, the same as those in part a, indicate the disease terms with which the genes were associated before pooling.
Figure 2
Figure 2. Histogram showing - for each query gene set.
The red lines show a difference of zero; values to the left of these lines represent individual random trials in which the traditional method outperformed the pooling method. This occurred only once, in one trial for the skin development gene set.
Figure 3
Figure 3. Triangle view of disease enrichment for the bone development gene set.
Each triangle represents one of the 26 top-level categories in the MeSH disease forest. Each dot represents a disease node with significant enrichment of brain development genes. To clearly indicate the significance of relationships between diseases and the query gene set in these small images, we used two colors: light brown dots indicate formula image, and darker brown dots, formula image. Mousing over the dots reveals a pop-up of the disease term associated with that node (Alzheimer Disease is shown). Clicking on the category name leads to a detailed view of that tree.
Figure 4
Figure 4. Detailed view of part of the Nervous System Disease subtree, showing enrichment of bone development genes.
Links to dementia and Alzheimer's disease are shown. Significance of each node in the tree is represented by color; a gradient of shades of blue indicates p-values ranging from 0 (darkest blue) to 1.0 (white). Clicking on a node or selecting a set of nodes allows users to see, in the box in the upper right corner, the selected disease terms, p-values, and genes shared between those diseases and the developmental gene set.
Figure 5
Figure 5. Expected results by tissue.
Density of enrichment of developmental gene sets (labels on the right) in major disease subtrees. Values are z-score normalized densities, computed as described in Methods. Darker squares indicate that a larger fraction of the disease terms in the MeSH category have significant enrichment (formula image) of genes in the indicated gene set. Expected connections appear approximately along the diagonal in the first 7 columns, and throughout the rightmost two columns.
Figure 6
Figure 6. The VEGF pathway and its relevance to both BPD hypotheses.
The relationships shown here are derived from the VEGF, PI3K-AKT, mTOR, and HIF-1 signaling pathways and the “Pathways in Cancer” map in the KEGG Pathway database. Dashed lines represent indirect regulation. Genes highlighted in orange are the five lung development genes implicated in ROP.
Figure 7
Figure 7. Example of comparison between pooling approach and traditional approach.
Illustration of the process for calculating formula image and formula image for the formula imageth random trial. 100 gene-disease associations involving genes in the query gene set are withheld. Using the remaining associations, p-values for enrichment of the disease gene set at each node are computed using both the traditional and pooling approaches. Nodes are assigned to formula image or formula image based on which approach shows more significant enrichment, and the rate at which each set is supported by withheld links is computed. The idea is that if a disease class is correctly linked to the query gene set, it should be more likely to be supported by withheld gene-disease associations from that same query set.

Similar articles

Cited by

References

    1. Talkowski ME, Ordulu Z, Pillalamarri V, Benson CB, Blumenthal I, et al. (2012) Clinical diagnosis by whole-genome sequencing of a prenatal sample. N Engl J Med 367: 2226–32. - PMC - PubMed
    1. Bianchi D (2012) From prenatal genomic diagnosis to fetal personalized medicine: progress and challenges. Nat Med 18: 1041–51. - PMC - PubMed
    1. Ding L, Abebe T, Beyene J, Wilke R, Goldberg A, et al. (2013) Rank-based genome-wide analysis reveals the association of Ryanodine receptor-2 gene variants with childhood asthma among human populations. Hum Genomics 7: 16. - PMC - PubMed
    1. Jiang YH, Yuen RK, Jin X, Wang M, Chen N, et al. (2013) Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am J Hum Genet 93: 249–63. - PMC - PubMed
    1. Barker D (2003) The developmental origins of adult disease. Eur J Epidemiol 18: 733–6. - PubMed

Publication types

MeSH terms

LinkOut - more resources