PseqIP: a nonredundant and exhaustive protein sequence data bank generated from 4 major existing collections
- PMID: 3449852
- DOI: 10.1002/prot.340010110
PseqIP: a nonredundant and exhaustive protein sequence data bank generated from 4 major existing collections
Abstract
Four major protein sequence data collections (NBRF-PIR, PSD-Kyoto, PGtrans, and NEWAT) have been merged into a single nonredundant data bank called PseqIP. The data bank entries were automatically matched by a heuristic computer program relying on the fast computation of the number of tetrapeptides shared by two sequences. PseqIP 1.0 includes 6,068 different protein sequences for a total of 1,357,067 residues, representing most of the available sequence information to date. During the course of this work, we found about 600 occurrences of a protein sequence recorded with a one-amino-acid variation in at least two different data banks. A flat file (ASCII computer-readable format) version of PseqIP 1.0, well-suited for exhaustive homology searches and statistical sequence analysis, is available from our laboratory.
Similar articles
-
A cross-reference table between the Protein Data Bank of macromolecular structures and the National Biomedical Research Foundation-Protein Identification Resource amino acid sequence data bank.Protein Seq Data Anal. 1989 Jul;2(4):295-308. Protein Seq Data Anal. 1989. PMID: 2771934
-
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975. J Mol Biol. 2000. PMID: 10966778
-
Blast sampling for structural and functional analyses.BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62. BMC Bioinformatics. 2007. PMID: 17319945 Free PMC article.
-
[Data bases in biochemistry--applications to analysis and prediction of protein properties and function].Postepy Biochem. 1989;35(1-2):45-61. Postepy Biochem. 1989. PMID: 2699029 Review. Polish. No abstract available.
-
Nucleic acid and protein sequence databases.Comput Appl Biosci. 1985;1(1):11-7. doi: 10.1093/bioinformatics/1.1.11. Comput Appl Biosci. 1985. PMID: 3916889 Review.
Cited by
-
Identification of four conserved motifs among the RNA-dependent polymerase encoding elements.EMBO J. 1989 Dec 1;8(12):3867-74. doi: 10.1002/j.1460-2075.1989.tb08565.x. EMBO J. 1989. PMID: 2555175 Free PMC article.
-
AG alpha 1 is the structural gene for the Saccharomyces cerevisiae alpha-agglutinin, a cell surface glycoprotein involved in cell-cell interactions during mating.Mol Cell Biol. 1989 Aug;9(8):3155-65. doi: 10.1128/mcb.9.8.3155-3165.1989. Mol Cell Biol. 1989. PMID: 2677666 Free PMC article.
-
Detection of protein similarities using nucleotide sequence databases.Nucleic Acids Res. 1988 Jul 11;16(13):6191-204. doi: 10.1093/nar/16.13.6191. Nucleic Acids Res. 1988. PMID: 3135536 Free PMC article.
-
cDNA cloning of the immunoglobulin heavy chain binding protein.Proc Natl Acad Sci U S A. 1988 Apr;85(7):2250-4. doi: 10.1073/pnas.85.7.2250. Proc Natl Acad Sci U S A. 1988. PMID: 2895472 Free PMC article.
-
Identification of new protein kinase-related genes in three herpesviruses, herpes simplex virus, varicella-zoster virus, and Epstein-Barr virus.J Virol. 1989 Jan;63(1):450-5. doi: 10.1128/JVI.63.1.450-455.1989. J Virol. 1989. PMID: 2535748 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources