Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jan 1;32(Database issue):D203-7.
doi: 10.1093/nar/gkh027.

HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database

Affiliations

HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database

Lucy A Stebbings et al. Nucleic Acids Res. .

Abstract

HOMSTRAD (http://www-cryst.bioc.cam.ac.uk/ homstrad/) is a collection of protein families, clustered on the basis of sequence and structural similarity. The database is unique in that the protein family sequence alignments have been specially annotated using the program, JOY, to highlight a wide range of structural features. Such data are useful for identifying key structurally conserved residues within the families. Superpositions of the structures within each family are also available and a sensitive structure-aided search engine, FUGUE, can be used to search the database for matches to a query protein sequence. Historically, HOMSTRAD families were generated using several key pieces of software, including COMPARER and MNYFIT, and held in a number of flat files and indexes. A new relational database version of HOMSTRAD, HOMSTRAD BETA (http://www-cryst.bioc.cam. ac.uk/homstradbeta/) is being developed using MySQL. This relational data structure provides more flexibility for future developments, reduces update times and makes data more easily accessible. Consequently it has been possible to add a number of new web features including a custom alignment facility. Altogether, this makes HOMSTRAD and its new BETA version, an excellent resource both for comparative modelling and for identifying distant sequence/structure similarities between proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A number of steps are required to incorporate data from the PDB into HOMSTRAD BETA. First, each PDB entry is separated into chains and those containing nucleic acid data only, Cα data only, theoretical structures or data that have been discredited in some way are added to a separate data table and processed no further. Those that survive this filtering process are passed through the procedures detailed in the figure (boxed, to the left) with input from SWISS-PROT, SCOP and Pfam data at various stages as indicated (very left hand side). To the right is a schematic representation tracing the processes undergone by three sample PDB chains, one of which (chain B) encodes a hypothetical protein containing two domains, the first domain of which consists of two fragments. Families are generated corresponding to each of the two domains and also to the full-length protein. The PDB chain A sequence is homologous to domain 1 of chain B and so is incorporated into the same family. Many families in HOMSTRAD are simpler than this and at present just over 1000 include only one chain with one fragment, as is seen for the hypothetical family that contains chain C. These families will increase in size as more structures are released into the PDB and are matched to the families.
Figure 2
Figure 2
Archaeon glyceraldehyde 3-phosphate dehydrogenases. (A) Shows the results of a keyword search using a SWISS-PROT accession number. When the link to arch_gpdh_N is followed, the arch_gpdh_N family (includes 1cf2_O and 1b7g_O PDB chains) home page is reached (B). Original features such as the JOY annotated alignment are shown (B) and, if Rasmol is installed and the RasMol link is clicked, the superimposed structures can be viewed (B). (C) Shows two new features: the custom alignment facility chooser page is shown, which gives an expanded list of all the PDB chains that are part of the family, including the non-representative members. These can be individually selected and a custom family generated. A new facility that shows links to other HOMSTRAD BETA families is also displayed (C). Also in (C) is the key to the JOY annotated alignments. (D) Shows the most N-terminal section of JOY annotated alignment that includes both Archaeon and non-Archaeon protein sequences (bottom two entries), highlighting the differences.

References

    1. Mizuguchi K., Deane,C.M., Blundell,T.L. and Overington,J.P. (1998) HOMSTRAD: A database of protein structure alignments for homologous families. Protein Sci., 7, 2469–2471. - PMC - PubMed
    1. de Bakker P.I.W., Bateman,A., Burke,D.F., Miguel,R.N., Mizuguchi,K., Shi,J., Shirai,H. and Blundell,T.L. (2001) HOMSTRAD: adding sequence information to structure-based alignments of homologous protein families. Bioinformatics, 17, 748–749. - PubMed
    1. Mizuguchi K., Deane,C.M., Blundell,T.L., Johnson,M.S. and Overington,J.P. (1998) JOY: protein sequence–structure representation and analysis. Bioinformatics, 14, 617–623. - PubMed
    1. Shi J., Blundell,T.L. and Mizuguchi,K. (2001) FUGUE: Sequence–structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol., 310, 243–257. - PubMed
    1. Williams M.G., Shirai,H., Shi,J., Nagendra,H.G., Mueller,J., Mizuguchi,K., Miguel,R.N., Lovell,S.C., Innis,C.A., Deane,C.M. et al. (2002) Sequence–structure homology recognition by iterative alignment refinement and comparative modeling. Proteins, 45 (Suppl.), 92–97. - PubMed

Publication types

Substances