Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jul 1;36(Web Server issue):W255-9.
doi: 10.1093/nar/gkn237. Epub 2008 May 7.

SEQATOMS: a web tool for identifying missing regions in PDB in sequence context

Affiliations

SEQATOMS: a web tool for identifying missing regions in PDB in sequence context

Bernd W Brandt et al. Nucleic Acids Res. .

Abstract

With over 46 000 proteins, the Protein Data Bank (PDB) is the most important database with structural information of biological macromolecules. PDB files contain sequence and coordinate information. Residues present in the sequence can be absent from the coordinate section, which means their position in space is unknown. Similarity searches are routinely carried out against sequences taken from PDB SEQRES. However, there no distinction is made between residues that have a known or unknown position in the 3D protein structure. We present a FASTA sequence database that is produced by combining the sequence and coordinate information. All residues absent from the PDB coordinate section are masked with lower-case letters, thereby providing a view of these residues in the context of the entire protein sequence, which facilitates inspecting 'missing' regions. We also provide a masked version of the CATH domain database. A user-friendly BLAST interface is available for similarity searching. In contrast to standard (stand-alone) BLAST output, which only contains upper-case letters, our output retains the lower-case letters of the masked regions. Thus, our server can be used to perform BLAST searching case-sensitively. Here, we have applied it to the study of missing regions in their sequence context. SEQATOMS is available at http://www.bioinformatics.nl/tools/seqatoms/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Histogram of missing residues in PDB protein structures. All missing regions are counted. This count is larger than the number of chains, since a single chain can have several missing regions.
Figure 2.
Figure 2.
An example BLAST output for the query 1LBG_A (Lactose Operon Repressor). The lower-case regions are indicated in the alignment graphic (grey) and in the alignment (Figure 3). The description section of the BLAST output provides links to the selected sequence database(s), the source database, NCBI Entrez Protein and CATH.
Figure 3.
Figure 3.
An example alignment showing lower-case masking (red) of the second hit sequence (1LBH_A) from the BLAST output presented in Figure 2. The FASTA header has been truncated and only the first line of the alignment is shown here.

References

    1. Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007;35:D301–D303. - PMC - PubMed
    1. Liu J, Perumal NB, Oldfield CJ, Su EW, Uversky VN, Dunker AK. Intrinsic disorder in transcription factors. Biochemistry. 2006;45:6873–6888. - PMC - PubMed
    1. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 2005;272:5129–5148. - PubMed
    1. Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB. Protein disorder prediction: implications for structural proteomics. Structure. 2003;11:1453–1459. - PubMed
    1. Linding R, Russell RB, Neduva V, Gibson TJ. GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res. 2003;31:3701–3708. - PMC - PubMed

Publication types