The ASTRAL Compendium in 2004

John-Marc Chandonia¹, Gary Hon, Nigel S Walker, Loredana Lo Conte, Patrice Koehl, Michael Levitt, Steven E Brenner

Affiliations

PMID: 14681391
PMCID: PMC308768
DOI: 10.1093/nar/gkh034

The ASTRAL Compendium in 2004

John-Marc Chandonia et al. Nucleic Acids Res. 2004.

. 2004 Jan 1;32(Database issue):D189-92.

doi: 10.1093/nar/gkh034.

Authors

John-Marc Chandonia¹, Gary Hon, Nigel S Walker, Loredana Lo Conte, Patrice Koehl, Michael Levitt, Steven E Brenner

Affiliation

¹ Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

PMID: 14681391
PMCID: PMC308768
DOI: 10.1093/nar/gkh034

Abstract

The ASTRAL Compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. Partially derived from the SCOP database of protein structure domains, it includes sequences for each domain and other resources useful for studying these sequences and domain structures. The current release of ASTRAL contains 54,745 domains, more than three times as many as the initial release 4 years ago. ASTRAL has undergone major transformations in the past 2 years. In addition to several complete updates each year, ASTRAL is now updated on a weekly basis with preliminary classifications of domains from newly released PDB structures. These classifications are available as a stand-alone database, as well as integrated into other ASTRAL databases such as representative subsets. To enhance the utility of ASTRAL to structural biologists, all SCOP domains are now made available as PDB-style coordinate files as well as sequences. In addition to sequences and representative subsets based on SCOP domains, sequences and subsets based on PDB chains are newly included in ASTRAL. Several search tools have been added to ASTRAL to facilitate retrieval of data by individual users and automated methods. ASTRAL may be accessed at http://astral.stanford. edu/.

PubMed Disclaimer

Figures

**Figure 1**
Data flow in ASTRAL. Primary data sources are shown in green. Primary ASTRAL databases are shown in light yellow. Less commonly used resources are shown in darker yellow. Resources added recently are outlined in light blue. Using the RAF maps, four complete sequence sets are created for every domain in the first seven classes of the SCOP database. Two sets (the genetic domain sets) include the genetic domain sequences described above, and the other two (the original-style sequence sets) use the prior method of splitting each multi-chain domain into multiple sequences. For each of these methodologies, one complete sequence set is derived from sequences in the PDB ATOM records, and another from sequences in the SEQRES records. The SEQRES sets (for both genetic domain and original-style methods) are used to derive representative subsets. Each set is fully compared against itself using BLAST, and subsets are created using three similarity criteria and various thresholds. Representatives are chosen according to AEROSPACI scores, described in the text. PDB chain sequence sets are derived from the SEQRES records of every PDB chain in SCOP; selected subsets are created at 90–100% ID thresholds. PDB-style files are derived from the RAF maps and SCOP domain definitions. At each new release of ASTRAL, all non-redundant sequences from each SCOP superfamily are aligned using MAFFT (10). A hidden Markov model (7) (HMM) is created from the multiple sequence alignment for each superfamily using HMMER (6). These HMMs are used to predict domains in the sequences of newly released PDB entries on a weekly basis. HMMs from the Pfam-A database are also used to predict domains in regions of the sequences not identified by HMMs derived from SCOP superfamilies. Unassigned regions of at least 50 consecutive residues are also predicted to be potential domains. The predicted domains (ASTEROIDS) are available in a single file, as well as optionally available integrated into representative subsets selected according to two similarity criteria (BLAST E-value and % identity) at various thresholds.

See this image and copyright information in PMC

References

1. Berman H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
1. Murzin A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536–540. - PubMed
1. Lo Conte L., Brenner,S.E., Hubbard,T.J., Chothia,C. and Murzin,A.G. (2002) SCOP Database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30, 264–267. - PMC - PubMed
1. Brenner S.E., Koehl,P. and Levitt,M. (2000) The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res., 28, 254–256. - PMC - PubMed
1. Chandonia J.M., Walker,N.S., Lo Conte,L., Koehl,P., Levitt,M. and Brenner,S.E. (2002) ASTRAL compendium enhancements. Nucleic Acids Res., 30, 260–263. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Associated data

Actions
- Search in PubMed
- Search in Structure

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The ASTRAL Compendium in 2004

Affiliation

The ASTRAL Compendium in 2004

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous