Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 May;9(1):10.1128/ecosalplus.ESP-0031-2019.
doi: 10.1128/ecosalplus.ESP-0031-2019.

Escherichia coli Small Proteome

Affiliations
Review

Escherichia coli Small Proteome

Matthew R Hemm et al. EcoSal Plus. 2020 May.

Abstract

Escherichia coli was one of the first species to have its genome sequenced and remains one of the best-characterized model organisms. Thus, it is perhaps surprising that recent studies have shown that a substantial number of genes have been overlooked. Genes encoding more than 140 small proteins, defined as those containing 50 or fewer amino acids, have been identified in E. coli in the past 10 years, and there is substantial evidence indicating that many more remain to be discovered. This review covers the methods that have been successful in identifying small proteins and the short open reading frames that encode them. The small proteins that have been functionally characterized to date in this model organism are also discussed. It is hoped that the review, along with the associated databases of known as well as predicted but undetected small proteins, will aid in and provide a roadmap for the continued identification and characterization of these proteins in E. coli as well as other bacteria.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Small protein gene identification in the E. coli genome over the past 6 years. (A) Histogram of currently annotated protein-coding genes in E. coli compared to those identified in 2013. (B) Histogram of currently annotated small protein genes in E. coli compared to those known in 2013. For both (A) and (B), the light gray bars represent small protein genes annotated in 2013, and the dark gray bars represent genes annotated in 2019. Data on annotated genes in 2013 are from E. coli K12 MG1655 genome annotation U00096.2. Data on annotated genes in 2019 were compiled from a combination of annotations from EcoCyc (92) and recent papers identifying new small proteins.
Figure 2
Figure 2
Ribosome binding sites for representative small protein-coding genes. The sequence logo for E. coli ribosome binding sites is reproduced from reference . Sequences of 12 small protein genes of unknown function are listed below. Red type corresponds to the predicted start codon, while the blue type indicates stretches of four or more G and A residues. Gibbs free energies (ΔG° in kcal/mol) for the interaction between the sequence shown and the 16S RNA were calculated using IntaRNA (http://rna.informatik.uni-freiburg.de/IntaRNA/Input.jsp) (94). No value is given for the three sequences for which no significant interaction was detected. Rpm (reads per million mapped) values for ribosome profiling carried out in the presence of the inhibitor Onc112 are taken from reference .
Figure 3
Figure 3
Structures of representative small proteins. Structures of AcrZ, KdpF, CydX, and CydH (red) in association with the AcrB multidrug efflux pump (PDB 4C48 [84]), Kdp potassium transporter (PDB 5MRW [69]), and cytochrome bd-I oxidase (PDB 6RKO [80]), respectively. The approximate position of the membrane is indicated by shading.

References

    1. Blattner FR, Plunkett G III, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453–1462 10.1126/science.277.5331.1453. - DOI - PubMed
    1. Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen IT, Peralta-Gil M, Karp PD. 2005. EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res 33:D334–D337 10.1093/nar/gki108. [PubMed] - DOI - PMC - PubMed
    1. Keseler IM, Collado-Vides J, Santos-Zavaleta A, Peralta-Gil M, Gama-Castro S, Muñiz-Rascado L, Bonavides-Martinez C, Paley S, Krummenacker M, Altman T, Kaipa P, Spaulding A, Pacheco J, Latendresse M, Fulcher C, Sarker M, Shearer AG, Mackie A, Paulsen I, Gunsalus RP, Karp PD. 2011. EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Res 39(Database):D583–D590 10.1093/nar/gkq1143. [PubMed] - DOI - PMC - PubMed
    1. Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt KD, Borodovsky M, Ostell J. 2016. NCBI Prokaryotic Genome Annotation Pipeline. Nucleic Acids Res 44:6614–6624 10.1093/nar/gkw569. [PubMed] - DOI - PMC - PubMed
    1. Rudd KE, Humphery-Smith I, Wasinger VC, Bairoch A. 1998. Low molecular weight proteins: a challenge for post-genomic research. Electrophoresis 19:536–544 10.1002/elps.1150190413. [PubMed] - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources