Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jan 1;32(Database issue):D189-92.
doi: 10.1093/nar/gkh034.

The ASTRAL Compendium in 2004

Affiliations

The ASTRAL Compendium in 2004

John-Marc Chandonia et al. Nucleic Acids Res. .

Abstract

The ASTRAL Compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. Partially derived from the SCOP database of protein structure domains, it includes sequences for each domain and other resources useful for studying these sequences and domain structures. The current release of ASTRAL contains 54,745 domains, more than three times as many as the initial release 4 years ago. ASTRAL has undergone major transformations in the past 2 years. In addition to several complete updates each year, ASTRAL is now updated on a weekly basis with preliminary classifications of domains from newly released PDB structures. These classifications are available as a stand-alone database, as well as integrated into other ASTRAL databases such as representative subsets. To enhance the utility of ASTRAL to structural biologists, all SCOP domains are now made available as PDB-style coordinate files as well as sequences. In addition to sequences and representative subsets based on SCOP domains, sequences and subsets based on PDB chains are newly included in ASTRAL. Several search tools have been added to ASTRAL to facilitate retrieval of data by individual users and automated methods. ASTRAL may be accessed at http://astral.stanford. edu/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data flow in ASTRAL. Primary data sources are shown in green. Primary ASTRAL databases are shown in light yellow. Less commonly used resources are shown in darker yellow. Resources added recently are outlined in light blue. Using the RAF maps, four complete sequence sets are created for every domain in the first seven classes of the SCOP database. Two sets (the genetic domain sets) include the genetic domain sequences described above, and the other two (the original-style sequence sets) use the prior method of splitting each multi-chain domain into multiple sequences. For each of these methodologies, one complete sequence set is derived from sequences in the PDB ATOM records, and another from sequences in the SEQRES records. The SEQRES sets (for both genetic domain and original-style methods) are used to derive representative subsets. Each set is fully compared against itself using BLAST, and subsets are created using three similarity criteria and various thresholds. Representatives are chosen according to AEROSPACI scores, described in the text. PDB chain sequence sets are derived from the SEQRES records of every PDB chain in SCOP; selected subsets are created at 90–100% ID thresholds. PDB-style files are derived from the RAF maps and SCOP domain definitions. At each new release of ASTRAL, all non-redundant sequences from each SCOP superfamily are aligned using MAFFT (10). A hidden Markov model (7) (HMM) is created from the multiple sequence alignment for each superfamily using HMMER (6). These HMMs are used to predict domains in the sequences of newly released PDB entries on a weekly basis. HMMs from the Pfam-A database are also used to predict domains in regions of the sequences not identified by HMMs derived from SCOP superfamilies. Unassigned regions of at least 50 consecutive residues are also predicted to be potential domains. The predicted domains (ASTEROIDS) are available in a single file, as well as optionally available integrated into representative subsets selected according to two similarity criteria (BLAST E-value and % identity) at various thresholds.

References

    1. Berman H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
    1. Murzin A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536–540. - PubMed
    1. Lo Conte L., Brenner,S.E., Hubbard,T.J., Chothia,C. and Murzin,A.G. (2002) SCOP Database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30, 264–267. - PMC - PubMed
    1. Brenner S.E., Koehl,P. and Levitt,M. (2000) The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res., 28, 254–256. - PMC - PubMed
    1. Chandonia J.M., Walker,N.S., Lo Conte,L., Koehl,P., Levitt,M. and Brenner,S.E. (2002) ASTRAL compendium enhancements. Nucleic Acids Res., 30, 260–263. - PMC - PubMed

Publication types

Associated data