Large-Scale Analyses of Human Microbiomes Reveal Thousands of Small, Novel Genes
- PMID: 31402174
- PMCID: PMC6764417
- DOI: 10.1016/j.cell.2019.07.016
Large-Scale Analyses of Human Microbiomes Reveal Thousands of Small, Novel Genes
Abstract
Small proteins are traditionally overlooked due to computational and experimental difficulties in detecting them. To systematically identify small proteins, we carried out a comparative genomics study on 1,773 human-associated metagenomes from four different body sites. We describe >4,000 conserved protein families, the majority of which are novel; ∼30% of these protein families are predicted to be secreted or transmembrane. Over 90% of the small protein families have no known domain and almost half are not represented in reference genomes. We identify putative housekeeping, mammalian-specific, defense-related, and protein families that are likely to be horizontally transferred. We provide evidence of transcription and translation for a subset of these families. Our study suggests that small proteins are highly abundant and those of the human microbiome, in particular, may perform diverse functions that have not been previously reported.
Keywords: annotation; bacteria; bioinformatics; domain; genome; microbe; microbiome; phage; prediction; small open reading frame; small proteins.
Copyright © 2019 Elsevier Inc. All rights reserved.
Conflict of interest statement
DECLARATION OF INTERESTS
N.G. is an employee and shareholder of One Codex. M.P.S. is a cofounder of Personalis, SensOmics, January, Filtricine, Akna, Qbio; he is on the advisory board of the companies he cofounded, along with Genapsys and Jupiter. A.S.B. is on the advisory board of Caribou Biosciences, January, and ArcBio. The authors declare no other competing financial interests.
Figures






Comment in
-
Tiny Hidden Genes within Our Microbiome.Cell. 2019 Aug 22;178(5):1034-1035. doi: 10.1016/j.cell.2019.07.039. Cell. 2019. PMID: 31442396
References
-
- Allan E, Hussain HA, Crawford KR, Miah S, Ascott ZK, Khwaja MH, and Hosie AHF (2007). Genetic variation in comC, the gene encoding competence-stimulating peptide (CSP) in Streptococcus mutans. FEMS Microbiol. Lett 268, 47–51. - PubMed
-
- Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, Petersen TN, Winther O, Brunak S, von Heijne G, and Nielsen H (2019). SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol 37, 420–23. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases