. 2015 Jan;43(Database issue):D204-12.

doi: 10.1093/nar/gku989. Epub 2014 Oct 27.

UniProt: a hub for protein information

UniProt Consortium

Collaborators

UniProt Consortium:
Alex Bateman, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Rolf Apweiler, Emanuele Alpi, Ricardo Antunes, Joanna Arganiska, Benoit Bely, Mark Bingley, Carlos Bonilla, Ramona Britto, Borisas Bursteinas, Gayatri Chavali, Elena Cibrian-Uhalte, Alan Da Silva, Maurizio De Giorgi, Tunca Dogan, Francesco Fazzini, Paul Gane, Leyla Garcia Castro, Penelope Garmiri, Emma Hatton-Ellis, Reija Hieta, Rachael Huntley, Duncan Legge, Wudong Liu, Jie Luo, Alistair MacDougall, Prudence Mutowo, Andrew Nightingale, Sandra Orchard, Klemens Pichler, Diego Poggioli, Sangya Pundir, Luis Pureza, Guoying Qi, Steven Rosanoff, Rabie Saidi, Tony Sawford, Aleksandra Shypitsyna, Edward Turner, Vladimir Volynkin, Tony Wardell, Xavier Watkins, Hermann Zellner, Andrew Cowley, Luis Figueira, Weizhong Li, Hamish McWilliam, Rodrigo Lopez, Ioannis Xenarios, Lydie Bougueleret, Alan Bridge, Sylvain Poux, Nicole Redaschi, Lucila Aimo, Ghislaine Argoud-Puy, Andrea Auchincloss, Kristian Axelsen, Parit Bansal, Delphine Baratin, Marie-Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Emmanuel Boutet, Lionel Breuza, Cristina Casal-Casas, Edouard de Castro, Elisabeth Coudert, Beatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Sebastien Gehant, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Florence Jungo, Guillaume Keller, Vicente Lara, Philippe Lemercier, Damien Lieberherr, Thierry Lombardot, Xavier Martin, Patrick Masson, Anne Morgat, Teresa Neto, Nevila Nouspikel, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Monica Pozzato, Manuela Pruess, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, Anne-Lise Veuthey, Cathy H Wu, Cecilia N Arighi, Leslie Arminski, Chuming Chen, Yongxing Chen, John S Garavelli, Hongzhan Huang, Kati Laiho, Peter McGarvey, Darren A Natale, Baris E Suzek, C R Vinayaka, Qinghua Wang, Yuqi Wang, Lai-Su Yeh, Meher Shruti Yerramalla, Jian Zhang

PMID: 25348405
PMCID: PMC4384041
DOI: 10.1093/nar/gku989

UniProt: a hub for protein information

UniProt Consortium. Nucleic Acids Res. 2015 Jan.

. 2015 Jan;43(Database issue):D204-12.

doi: 10.1093/nar/gku989. Epub 2014 Oct 27.

Author

UniProt Consortium

Collaborators

UniProt Consortium:
Alex Bateman, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Rolf Apweiler, Emanuele Alpi, Ricardo Antunes, Joanna Arganiska, Benoit Bely, Mark Bingley, Carlos Bonilla, Ramona Britto, Borisas Bursteinas, Gayatri Chavali, Elena Cibrian-Uhalte, Alan Da Silva, Maurizio De Giorgi, Tunca Dogan, Francesco Fazzini, Paul Gane, Leyla Garcia Castro, Penelope Garmiri, Emma Hatton-Ellis, Reija Hieta, Rachael Huntley, Duncan Legge, Wudong Liu, Jie Luo, Alistair MacDougall, Prudence Mutowo, Andrew Nightingale, Sandra Orchard, Klemens Pichler, Diego Poggioli, Sangya Pundir, Luis Pureza, Guoying Qi, Steven Rosanoff, Rabie Saidi, Tony Sawford, Aleksandra Shypitsyna, Edward Turner, Vladimir Volynkin, Tony Wardell, Xavier Watkins, Hermann Zellner, Andrew Cowley, Luis Figueira, Weizhong Li, Hamish McWilliam, Rodrigo Lopez, Ioannis Xenarios, Lydie Bougueleret, Alan Bridge, Sylvain Poux, Nicole Redaschi, Lucila Aimo, Ghislaine Argoud-Puy, Andrea Auchincloss, Kristian Axelsen, Parit Bansal, Delphine Baratin, Marie-Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Emmanuel Boutet, Lionel Breuza, Cristina Casal-Casas, Edouard de Castro, Elisabeth Coudert, Beatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Sebastien Gehant, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Florence Jungo, Guillaume Keller, Vicente Lara, Philippe Lemercier, Damien Lieberherr, Thierry Lombardot, Xavier Martin, Patrick Masson, Anne Morgat, Teresa Neto, Nevila Nouspikel, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Monica Pozzato, Manuela Pruess, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Andre Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, Anne-Lise Veuthey, Cathy H Wu, Cecilia N Arighi, Leslie Arminski, Chuming Chen, Yongxing Chen, John S Garavelli, Hongzhan Huang, Kati Laiho, Peter McGarvey, Darren A Natale, Baris E Suzek, C R Vinayaka, Qinghua Wang, Yuqi Wang, Lai-Su Yeh, Meher Shruti Yerramalla, Jian Zhang

PMID: 25348405
PMCID: PMC4384041
DOI: 10.1093/nar/gku989

Abstract

UniProt is an important collection of protein sequences and their annotations, which has doubled in size to 80 million sequences during the past year. This growth in sequences has prompted an extension of UniProt accession number space from 6 to 10 characters. An increasing fraction of new sequences are identical to a sequence that already exists in the database with the majority of sequences coming from genome sequencing projects. We have created a new proteome identifier that uniquely identifies a particular assembly of a species and strain or subspecies to help users track the provenance of sequences. We present a new website that has been designed using a user-experience design process. We have introduced an annotation score for all entries in UniProt to represent the relative amount of knowledge known about each protein. These scores will be helpful in identifying which proteins are the best characterized and most informative for comparative analysis. All UniProt data is provided freely and is available on the web at http://www.uniprot.org/.

PubMed Disclaimer

Figures

**Figure 1.**
Growth of UniProt and UniRef databases.

**Figure 2.**
The distribution of proteomes and reference proteomes across the tree of life.

**Figure 3.**
(A) Screenshot of the sequence annotation section of human ENOSF1 entry (UniProt Q7L5Y1, http://www.uniprot.org/uniprot/Q7L5Y1). (B) Image of magnesium-binding sites of human ENOSF1 X-ray structure in complex with magnesium (blue) using PyMol (PDB accession 4A35). (C) Image of active site of *X. campestris* fuconate dehydratase X-ray structure in complex with magnesium (blue) and substrate (green) using PyMol (PDB accession 2HXT).

**Figure 4.**
Representation of rule based on HAMAP entry MF_ 01864.

**Figure 5.**
Growth of coverage of UniProtKB/TrEMBL by manually curated UniRules. The Y-axis shows the percentage coverage of UniProtKB/TrEMBL by UniRule as a whole as well as by the individual sources. PIR represents the combination of both PIR Site Rules (PIRSR) and PIR Name Rules (PIRNR).

**Figure 6.**
Customizable UniProt search results for ‘insulin’, with search term filters and breakdown by popular organism, and an additional column showing the annotation score for each entry.

**Figure 7.**
UniProt entry view for human coiled-coil and C2 domain-containing protein 2A (UniProt Q9P2K1).

**Figure 8.**
Reference proteome page for *Bacillus subtilis* (strain 168).

**Figure 9.**
Distribution of citations to UniProt Publications by year.

**Figure 10.**
Distribution of number of publications citing UniProt, according to research categories. Note a publication may be classified in more than one category.

See this image and copyright information in PMC

References

1. Poux S., Magrane M., Arighi C.N., Bridge A., O'Donovan C., Laiho K., UniProt C. Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data. Database. 2014;2014 doi: 10.1093/database/bau016. - PMC - PubMed
1. Suzek B.E., Huang H., McGarvey P., Mazumder R., Wu C.H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23:1282–1288. - PubMed
1. Leinonen R., Diez F.G., Binns D., Fleischmann W., Lopez R., Apweiler R. UniProt archive. Bioinformatics. 2004;20:3236–3237. - PubMed
1. Nakamura Y., Cochrane G., Karsch-Mizrachi I., International Nucleotide Sequence Database, C. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2013;41:D21–D24. - PMC - PubMed
1. Chen C., Natale D.A., Finn R.D., Huang H., Zhang J., Wu C.H., Mazumder R. Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation. PLoS ONE. 2011;6:e18910. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

UniProt: a hub for protein information

Collaborators

UniProt: a hub for protein information

Author

Collaborators

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Miscellaneous