Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan;6(1):e1458.
doi: 10.1371/journal.pntd.0001458. Epub 2012 Jan 17.

A semantic problem solving environment for integrative parasite research: identification of intervention targets for Trypanosoma cruzi

Affiliations

A semantic problem solving environment for integrative parasite research: identification of intervention targets for Trypanosoma cruzi

Priti P Parikh et al. PLoS Negl Trop Dis. 2012 Jan.

Abstract

Background: Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge.

Methodology/principal findings: We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results.

Conclusion/significance: The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. SPSE architecture showing how the various components work together.
Figure 2
Figure 2. Submodules of our PEO ontology focusing on gene knockout, strain creation and microarray experiments.
Figure 3
Figure 3. Screenshot of the Cuebee interface for query formulation.
The row contains a triple (subject-predicate-object) that is required to formulate the query. The expressions over arrows represent relationships (predicates) that link the subject and object. The query formulation is initiated by first selecting the server (PE All Datasets). If the users know which particular datasets will be used for the query, they can select dataset there, such as microarray dataset, gene knockout dataset, etc. However, if the users are not sure about this, then they can select PE All Datasets, and Cuebee will try to find answers using all the datasets in Parasite Knowledge Base (PKB). Users then begin to type in the search field and Cuebee provides suggestions matching the first letters typed in a drop-down list. In this case “Microarray Analysis” is selected. The users can select specific instance of Microarray Analysis if known. Else, users can select “any_Microarray_analysis”. This will let Cuebee find answers using all the microarray data. Cuebee provides definitions on each concept (under Class Description) and more information about relationships (under Relations) as shown for the concept “gene” in this figure. Relationships that have asterix in front means that they are directly associated with the concept “gene” where “gene” acts as a subject of the triple. This information comes from the ontology, PEO in this case. Once the desired query is formulated, the users can click on Search and Cuebee will provide results under Specific results or General results section. Users can also query on the results of their first query using Refine button. The video demo on querying using Cuebee is available at: http://wiki.knoesis.org/index.php/Manuscript_Details.
Figure 4
Figure 4. Old Web forms in the lab that stored the experimental provenance data in a conventional relational database.
Figure 5
Figure 5. New Web forms that store the data in a RDF subject-predicate-object (i.e., triple) format providing opportunity to relate the data to ontology concepts.
Storing the data using these Web forms has no impact on the front-end user experience, but it offers extended querying functionality through the use of ontology concepts. Provenance information added through these Web forms is instantly available for querying.
Figure 6
Figure 6. Screenshot of the Cuebee interface after formulation of the query 1, “List the genes that are downregulated in the epimastigote stage and exist in a single metabolic pathway.”
Each row contains triples that are required to formulate the query. The query formulation is initiated by first selecting the server (PE All Datasets). After selecting the dataset, users begin to type in the search field and Cuebee provides suggestions matching the first letters typed in a drop down list. In this case “Microarray Analysis” is selected, and the query was limited to microarray analysis data pertaining to only “epimastigote” lifecycle stage of the parasite using filtering function of Cuebee. The triples are then extended as shown to achieve the desired query. The query uses “Group by” function of Cuebee to group all the epimastigote genes associated with a single metabolic pathway and “Refine” function to identify only those genes from the group that are downregulated; i.e, with log2 ratio less than −1. Specific results show a part of the results that include gene information from microarray lab data and pathway information from KEGG where each pathway ID represents specific pathway in KEGG.
Figure 7
Figure 7. Screenshot of the Cuebee interface after formulation of the query 2, “List the summaries of gene knock-out attempts, including both plasmid construction and strain creation, for all gene knock-out targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosoma brucei.”
Each row contains triples that are required to formulate the query. The query formulation is initiated by first selecting the server (PE All Datasets). After selecting the dataset users begin to type in the search field and Cuebee provides suggestions matching the first letters typed in a drop down list. In this case “knockout region construct protocol” is selected. The triples are then extended as shown to achieve the desired query. The query uses the “Group by” function of Cuebee to group all the genes that are 2-fold upregulated in amastigote and negation function to identify only those that have orthologs in Leishmania and not in T. brucei.

Similar articles

Cited by

References

    1. Hertz-Fowler C, Peacock CS, Wood V, Aslett M, Kerhornou A, et al. GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic acids research. 2004;32:D339–D343. - PMC - PubMed
    1. Aurrecoechea C, Brestelli J, Brunk BP, Fischer S, Gajria B, et al. EuPathDB: a portal to eukaryotic pathogen databases. Nucleic acids research 2009 - PMC - PubMed
    1. Chukualim B, Peters N, Fowler C, Berriman M. TrypanoCyc - a metabolic pathway database for Trypanosoma brucei. BMC Bioinformatics. 2008;9:P5.
    1. Ackermann AA, Carmona SJ, Aguero F. TcSNP: a database of genetic variation in Trypanosoma cruzi. Nucleic acids research. 2009;37:D544–D549. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. - PMC - PubMed

Publication types

Substances