Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Oct 25:3:32.
doi: 10.1186/1471-2105-3-32. Epub 2002 Oct 25.

SeqHound: biological sequence and structure database as a platform for bioinformatics research

Affiliations

SeqHound: biological sequence and structure database as a platform for bioinformatics research

Katerina Michalickova et al. BMC Bioinformatics. .

Abstract

Background: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment.

Results: SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries.

Conclusions: The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The SeqHound database system in UML. From the bottom up: the system relies on data provided by the NCBI FTP site and the Gene Ontology resource. It uses the NCBI programming toolkit, the database management system (DBMS) and the bzip compression scheme as programming tools. The database is filled and updated using SeqHound parsers, programming tools and NCBI data as input. The database is searched using the SeqHound query interface which is usable in three forms – as CGI-based web pages, as a local API and as a remote API. All applications (top right) are written using the SeqHound API.
Figure 2
Figure 2
The database schema in UML. Each box depicts one table within the SeqHound system. The grey areas contain the table names. PK stands for "primary key". For the majority of the tables, the primary key is the GenInfo (GI) identifier. Each subsequent entry in each of the boxes indicates a field of information stored in the tables. Required fields are in bold. ASN.1 schema in these tables can be found at http://ncbi.nlm.nih.gov/IEB (for the Bioseq, Seq-Entry, Cdd and Biostruc) and at http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/slritools/slri/seqhound/asn for the rest of the objects.
Figure 3
Figure 3
The application programming interface (API) in UML. The SeqHound API consists of the database administration API, the local and remote query APIs, the formatdb API and the Clustal API. The remote server executes remote API requests using local API and returns results to a client. The WWW server utilizes the local API to present WWW pages to the user. Each box contains a group of programming functions with similar purpose. The individual functions are used to retrieve a set of data from the SeqHound system.
Figure 4
Figure 4
Clustal formatted tyrosyl tRNA synthetase sequence. The letter "A" denotes an α-helix, "B" a β-strand. The capital letters indicate that the automated secondary structure assignment (as annotated in the MMDB database) and the assignment by authors agreed while the lower case letter indicates that there was a disagreement.

Similar articles

Cited by

References

    1. Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996;266:141–162. - PubMed
    1. Stoesser G, Baker W, van den BA, Camon E, Garcia-Pastor M, Kanz C, Kulikova T, Leinonen R, Lin Q, Lombard V. The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 2002;30:21–26. doi: 10.1093/nar/30.1.21. - DOI - PMC - PubMed
    1. Bader GD, Hogue CW. BIND-a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics. 2000;16:465–477. doi: 10.1093/bioinformatics/16.5.465. - DOI - PubMed
    1. Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW. BIND-The biomolecular interaction network database. Nucleic Acids Res. 2001;29:242–245. doi: 10.1093/nar/29.1.242. - DOI - PMC - PubMed
    1. Betel D, Hogue CW. Kangaroo – A pattern-matching program for biological sequences. BMC Bioinformatics. 2002;3:20. doi: 10.1186/1471-2105-3-20. - DOI - PMC - PubMed

Publication types

LinkOut - more resources