SeqHound: biological sequence and structure database as a platform for bioinformatics research

Katerina Michalickova¹, Gary D Bader, Michel Dumontier, Hao Lieu, Doron Betel, Ruth Isserlin, Christopher W V Hogue

Affiliations

PMID: 12401134
PMCID: PMC138791
DOI: 10.1186/1471-2105-3-32

SeqHound: biological sequence and structure database as a platform for bioinformatics research

Katerina Michalickova et al. BMC Bioinformatics. 2002.

. 2002 Oct 25:3:32.

doi: 10.1186/1471-2105-3-32. Epub 2002 Oct 25.

Authors

Katerina Michalickova¹, Gary D Bader, Michel Dumontier, Hao Lieu, Doron Betel, Ruth Isserlin, Christopher W V Hogue

Affiliation

¹ Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8. katerina@mshri.on.ca

PMID: 12401134
PMCID: PMC138791
DOI: 10.1186/1471-2105-3-32

Abstract

Background: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment.

Results: SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries.

Conclusions: The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.

PubMed Disclaimer

Figures

**Figure 1**
The SeqHound database system in UML. From the bottom up: the system relies on data provided by the NCBI FTP site and the Gene Ontology resource. It uses the NCBI programming toolkit, the database management system (DBMS) and the bzip compression scheme as programming tools. The database is filled and updated using SeqHound parsers, programming tools and NCBI data as input. The database is searched using the SeqHound query interface which is usable in three forms – as CGI-based web pages, as a local API and as a remote API. All applications (top right) are written using the SeqHound API.

**Figure 2**
The database schema in UML. Each box depicts one table within the SeqHound system. The grey areas contain the table names. PK stands for "primary key". For the majority of the tables, the primary key is the GenInfo (GI) identifier. Each subsequent entry in each of the boxes indicates a field of information stored in the tables. Required fields are in bold. ASN.1 schema in these tables can be found at http://ncbi.nlm.nih.gov/IEB (for the Bioseq, Seq-Entry, Cdd and Biostruc) and at http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/slritools/slri/seqhound/asn for the rest of the objects.

**Figure 3**
The application programming interface (API) in UML. The SeqHound API consists of the database administration API, the local and remote query APIs, the formatdb API and the Clustal API. The remote server executes remote API requests using local API and returns results to a client. The WWW server utilizes the local API to present WWW pages to the user. Each box contains a group of programming functions with similar purpose. The individual functions are used to retrieve a set of data from the SeqHound system.

**Figure 4**
Clustal formatted tyrosyl tRNA synthetase sequence. The letter "A" denotes an α-helix, "B" a β-strand. The capital letters indicate that the automated secondary structure assignment (as annotated in the MMDB database) and the assignment by authors agreed while the lower case letter indicates that there was a disagreement.

See this image and copyright information in PMC

References

1. Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996;266:141–162. - PubMed
1. Stoesser G, Baker W, van den BA, Camon E, Garcia-Pastor M, Kanz C, Kulikova T, Leinonen R, Lin Q, Lombard V. The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 2002;30:21–26. doi: 10.1093/nar/30.1.21. - DOI - PMC - PubMed
1. Bader GD, Hogue CW. BIND-a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics. 2000;16:465–477. doi: 10.1093/bioinformatics/16.5.465. - DOI - PubMed
1. Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW. BIND-The biomolecular interaction network database. Nucleic Acids Res. 2001;29:242–245. doi: 10.1093/nar/29.1.242. - DOI - PMC - PubMed
1. Betel D, Hogue CW. Kangaroo – A pattern-matching program for biological sequences. BMC Bioinformatics. 2002;3:20. doi: 10.1186/1471-2105-3-20. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SeqHound: biological sequence and structure database as a platform for bioinformatics research

Affiliation

SeqHound: biological sequence and structure database as a platform for bioinformatics research

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources