Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec 22;12 Suppl 4(Suppl 4):S3.
doi: 10.1186/1471-2164-12-S4-S3. Epub 2011 Dec 22.

Preimplantation development regulatory pathway construction through a text-mining approach

Affiliations

Preimplantation development regulatory pathway construction through a text-mining approach

Elisa Donnard et al. BMC Genomics. .

Abstract

Background: The integration of sequencing and gene interaction data and subsequent generation of pathways and networks contained in databases such as KEGG Pathway is essential for the comprehension of complex biological processes. We noticed the absence of a chart or pathway describing the well-studied preimplantation development stages; furthermore, not all genes involved in the process have entries in KEGG Orthology, important information for knowledge application with relation to other organisms.

Results: In this work we sought to develop the regulatory pathway for the preimplantation development stage using text-mining tools such as Medline Ranker and PESCADOR to reveal biointeractions among the genes involved in this process. The genes present in the resulting pathway were also used as seeds for software developed by our group called SeedServer to create clusters of homologous genes. These homologues allowed the determination of the last common ancestor for each gene and revealed that the preimplantation development pathway consists of a conserved ancient core of genes with the addition of modern elements.

Conclusions: The generation of regulatory pathways through text-mining tools allows the integration of data generated by several studies for a more complete visualization of complex biological processes. Using the genes in this pathway as "seeds" for the generation of clusters of homologues, the pathway can be visualized for other organisms. The clustering of homologous genes together with determination of the ancestry leads to a better understanding of the evolution of such process.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Biointeraction extraction from PESCADOR. Top: Sample abstract tagged by PESCADOR. Gene or protein names (terms) recognized are highlighted in violet and the biointeraction words in yellow. The platform allows users to search for their interactions of interest by terms, abstracts or concepts of interest added initially by the user. Bottom: Manual curation of the information presented in the abstract and its graphical representation in the form of a regulatory pathway.
Figure 2
Figure 2
Preimplantation development pathway. The figure shows a pathway representation of the genes involved in the regulation of the preimplantation development and interactions between them. Some functions are also detailed in the grey rectangles. In the upper part of the figure are located genes involved in the early stages of development (until blastocyst formation) below these, the left part corresponds to the regulations that occur in the inner cell mass, the portion of cells that remains undifferentiated during a longer period of time, part of these cells will give rise to the primitive endoderm and the genes that regulate this process are shown in the bottom left. In the right are the genes involved in the development of the outer cells of the blastocyst, which differentiate to form the trophectoderm. The interactions are described in the text. KEGG Markup Language was used for pathway representation. The developmental stages figures were adapted from Yamanaka et al. 2006 [14]. The pathway genes are represented according to their ancestry based on the determination of their Last Common Ancestor. Genes considered recent are shown in green while genes of more ancient origin are shown in lilac. Genes that present an ortholog in D. melanogaster are marked (*). This will be further adressed in the text section “Pathway Ancestry”. DPPA1 and hRSCP are shown in grey due to the fact that the lack of corresponding SwissProt annotated gene product to be used as seed prevented their use in this analysis.
Figure 3
Figure 3
Gene origin in human evolution. Distribution of the genes in the preimplantation pathway according to their origin in clades of the human lineage, based on the determination of the Last Common Ancestor for the ortholog clusters generated by SeedServer. The y-axis represents the number of genes and the x-axis represents the taxonomical groups in which the genes originated.
Figure 4
Figure 4
Pathway construction flowchart. The initial step consists of a PubMed search with the subject of interest (e.g. preimplantation development). The list of PubMed identifiers (PMIDs) obtained in the search is then used in the web tool Medline Ranker as the background set along with a list of PMIDs of manually selected abstracts considered informative which form the test set. The tool generates a list of abstracts classified by order of relevance. Best 1000 abstracts are recovered and their corresponding PMID is then introduced in the PESCADOR platform. Abstracts are tagged by PESCADOR and provide a source of biointeractions for manual curation and pathway construction. UniProt IDs for products of the genes present in the final pathway are obtained and used as seed in SeedServer. The software recruits homologues for each gene and creates the final clusters. Taxonomy IDs from each cluster can be used for Last Common Ancestor (LCA) determination.

Similar articles

Cited by

References

    1. Hoffmann R, Valencia A. A gene network for navigating the literature. Nat Genet. 2004;36:664. doi: 10.1038/ng0704-664. - DOI - PubMed
    1. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M. et al.STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009;37:D412–416. doi: 10.1093/nar/gkn760. - DOI - PMC - PubMed
    1. Letunic I, Yamada T, Kanehisa M, Bork P. iPath: interactive exploration of biochemical pathways and networks. Trends Biochem Sci. 2008;33:101–103. doi: 10.1016/j.tibs.2008.01.001. - DOI - PubMed
    1. Karp PD, Ouzounis CA, Moore-Kochlacs C, Goldovsky L, Kaipa P, Ahren D, Tsoka S, Darzentas N, Kunin V, Lopez-Bigas N. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 2005;33:6083–6089. doi: 10.1093/nar/gki892. - DOI - PMC - PubMed
    1. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. - DOI - PMC - PubMed

LinkOut - more resources