Federated access to heterogeneous information resources in the Neuroscience Information Framework (NIF)

Amarnath Gupta¹, William Bug, Luis Marenco, Xufei Qian, Christopher Condit, Arun Rangarajan, Hans Michael Müller, Perry L Miller, Brian Sanders, Jeffrey S Grethe, Vadim Astakhov, Gordon Shepherd, Paul W Sternberg, Maryann E Martone

Affiliations

PMID: 18958629
PMCID: PMC2689790
DOI: 10.1007/s12021-008-9033-y

Federated access to heterogeneous information resources in the Neuroscience Information Framework (NIF)

Amarnath Gupta et al. Neuroinformatics. 2008 Sep.

. 2008 Sep;6(3):205-17.

doi: 10.1007/s12021-008-9033-y. Epub 2008 Oct 29.

Authors

Affiliation

¹ San Diego Supercomputer Center, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA. gupta@sdsc.edu

PMID: 18958629
PMCID: PMC2689790
DOI: 10.1007/s12021-008-9033-y

Abstract

The overarching goal of the NIF (Neuroscience Information Framework) project is to be a one-stop-shop for Neuroscience. This paper provides a technical overview of how the system is designed. The technical goal of the first version of the NIF system was to develop an information system that a neuroscientist can use to locate relevant information from a wide variety of information sources by simple keyword queries. Although the user would provide only keywords to retrieve information, the NIF system is designed to treat them as concepts whose meanings are interpreted by the system. Thus, a search for term should find a record containing synonyms of the term. The system is targeted to find information from web pages, publications, databases, web sites built upon databases, XML documents and any other modality in which such information may be published. We have designed a system to achieve this functionality. A central element in the system is an ontology called NIFSTD (for NIF Standard) constructed by amalgamating a number of known and newly developed ontologies. NIFSTD is used by our ontology management module, called OntoQuest to perform ontology-based search over data sources. The NIF architecture currently provides three different mechanisms for searching heterogeneous data sources including relational databases, web sites, XML documents and full text of publications. Version 1.0 of the NIF system is currently in beta test and may be accessed through http://nif.nih.gov.

PubMed Disclaimer

Figures

**Figure 1**
Figure 1(a). The result of the query (mouse model cDNA) against Google. The top results are very general, and mostly from papers that are indexed by Google. Figure 1(b). The same query as in Figure 1(a) now executed against NIF. In contrast with Google, the selective web crawling coverage of NIF enables it to return results that are more closely related to Neuroscience.

**Figure 2**
The architecture of the NIF system is organized by layers; the clients at the top of the diagram, the data and ontology sources are at the bottom. The middle layers contain the modules for supporting search, data and index structures, and the different query handlers. The word Web has been abbreviated to W. The combination of the databases, Textpresso, the web resources are collectively called “NIF Data Resources”.

**Figure 3**
The advanced search query interface allows ontological expansion and synonym selection for query terms. The results of the NIF Web are ranked by a number of criteria including both content and recency of documents.

**Figure 4**
A “data flow” trace that can occur while a keyword query is processed. To avoid clutter, we did not show the invocation of the index manager in a separate module. The mediator registry is connected to the Source Query Wrapper with a bidirectional connection because the registry is queried by the (database) wrapper and gets an answer back from it. For the same reason there is a bidirectional connection between the NIF Search Coordinator and the Web Result Postprocessor. Other variants of this trace are possible depending on the choices made by the user.

**Figure 5**
The right panel of a NIF Web search shows a meaningful clustering of the total result set. The Bloomington Drosophila Stock Center was not in the NIF Registry but is an example of an important resource that was picked up by our focused crawling strategy.

**Figure 6**
The NIF Registry is human curated and hence prone to variations in spelling, classifications, and general characterization of a resource. The use of fuzzy search is an effective way to find approximately matching terms, and thus improves result recall despite the variation in data.

**Figure 7**
Federated search of relational data sources allows the NIF system to take advantage of the schema registration process. Since the schema is registered, it is easier to design the result page to show meaningful tables and columns. It also allows the result designer to choose output data in such a way that the database results can be hyperlinked to the original data records and to any web-accessible tools exposed by the data sources. For SUMSDB, the NIF search on “hippocampus” leads to the display of the brain surfaces in the WebCaret tool.

See this image and copyright information in PMC

References

1. Astakhov V, Gupta A, Grethe JS, Ross E, Little D, Yilmaz A, Qian X, Santini S, Martone ME, Ellisman M. Semantically based data integration environment for biomedical research. Proc. 19th IEEE Symposium on Computer-Based Medical Systems; IEEE Computer Society, Washington, DC, USA. 2006. pp. 171–176.
1. Bug W, Ascoli GA, Grethe JS, Gupta A, Fennema-Notestine C, Laird A, Larson S, Rubin D, Shepherd GM, Turner JA, Martone ME. The NIFSTD and BIRNLex vocabularies: Building comprehensive ontologies for neuroscience. Neuroinformatics. 2008 this issue. - PMC - PubMed
1. Chen L, Gupta A, Kurul ME. Stack-based algorithms for pattern matching on dags. Proc. 31st Int. Conf. on Very Large Databases (VLDB); Stockholm. 2005. pp. 493–504.
1. Chen L, Martone ME, Gupta A, Fong L, Wong-Barnum M. Ontoquest: Exploring ontological data made easy. Proc. 31st Int. Conf. on Very Large Databases (VLDB); 2006. pp. 1183–1186.
1. Franklin MJ, Halevy AY, Maier D. From databases to dataspaces: a new abstraction for information management. SIGMOD Record. 2005;34(4):27–33.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Federated access to heterogeneous information resources in the Neuroscience Information Framework (NIF)

Affiliation

Federated access to heterogeneous information resources in the Neuroscience Information Framework (NIF)

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources