Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 8;88(3):e0185121.
doi: 10.1128/AEM.01851-21. Epub 2021 Dec 1.

Polyphenol Utilization Proteins in the Human Gut Microbiome

Affiliations

Polyphenol Utilization Proteins in the Human Gut Microbiome

Bo Zheng et al. Appl Environ Microbiol. .

Abstract

Dietary polyphenols can significantly benefit human health, but their bioavailability is metabolically controlled by human gut microbiota. To facilitate the study of polyphenol metabolism for human gut health, we have manually curated experimentally characterized polyphenol utilization proteins (PUPs) from published literature. This resulted in 60 experimentally characterized PUPs (named seeds) with various metadata, such as species and substrate. Further database search found 107,851 homologs of the seeds from UniProt and UHGP (unified human gastrointestinal protein) databases. All PUP seeds and homologs were classified into protein classes, families, and subfamilies based on Enzyme Commission (EC) numbers, Pfam (protein family) domains, and sequence similarity networks. By locating PUP homologs in the genomes of UHGP, we have identified 1,074 physically linked PUP gene clusters (PGCs), which are potentially involved in polyphenol metabolism in the human gut. The gut microbiome of Africans was consistently ranked the top in terms of the abundance and prevalence of PUP homologs and PGCs among all geographical continents. This reflects the fact that dietary polyphenols are consumed by the African population more commonly than by other populations, such as Europeans and North Americans. A case study of the Hadza hunter-gatherer microbiome verified the feasibility of using dbPUP to profile metagenomic data for biologically meaningful discovery, suggesting an association between diet and PUP abundance. A Pfam domain enrichment analysis of PGCs identified a number of putatively novel PUP families. Lastly, a user-friendly web interface (https://bcb.unl.edu/dbpup/) provides all the data online to facilitate the research of polyphenol metabolism for improved human health. IMPORTANCE Long-term consumption of polyphenol-rich foods has been shown to lower the risk of various human diseases, such as cardiovascular diseases, cancers, and metabolic diseases. Raw polyphenols are often enzymatically processed by gut microbiome, which contains various polyphenol utilization proteins (PUPs) to produce metabolites with much higher bioaccessibility to gastrointestinal cells. This study delivered dbPUP as an online database for experimentally characterized PUPs and their homologs in human gut microbiome. This work also performed a systematic classification of PUPs into enzyme classes, families, and subfamilies. The signature Pfam domains were identified for PUP families, enabling conserved domain-based PUP annotation. This standardized sequence similarity-based PUP classification system offered a guideline for the future inclusion of new experimentally characterized PUPs and the creation of new PUP families. An in-depth data analysis was further conducted on PUP homologs and physically linked PUP gene clusters (PGCs) in gut microbiomes of different human populations.

Keywords: PUP; PUP gene clusters; gut microbiota; microbiome; polyphenol; polyphenol utilization proteins.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

We declare no conflict of interest.

Figures

FIG 1
FIG 1
Workflow of the development of dbPUP. The four major tasks are shown as vertical bars (right): (1) seed protein curation, (2) seed sequence analysis for class and family level classification, (3) data expansion to include seed homologs from UniProt and UHGP, and (4) homolog data visualization and classification. The detailed data, methods, and tools are provided in the workflow.
FIG 2
FIG 2
Taxonomic distribution of the 60 seed PUPs.
FIG 3
FIG 3
Sequence similarity networks of 60 PUP seed proteins. Gray lines mean that two connected proteins (nodes) are similar to each other with an E value of <10−5. Domain architecture for each PUP is shown in a box beside each family. Protein nodes are colored according to their enzyme class assignment (see Results and Discussion). Domains shown in pink are signature Pfam domains.
FIG 4
FIG 4
Krona taxonomy distribution plots of PUP homologs in UniProt. (A) UniProt homologs of all the 26 PUP families collectively. (B) UniProt homologs of 28 seed-containing subfamilies collectively.
FIG 5
FIG 5
Analysis of 51,157 PUP homologs from UHGG. (A) Phylum-level taxonomic distribution and (B) continent-level geographical distribution of 39,296 PUP-containing genomes. (C) Top 10 UHGG species by the percentage of PUP homologs in the genomes of the species; UHGG species names are followed by the number of genomes in parentheses. (D) Pie charts and boxplots. Pie charts show the relative fraction of PUPs in different classes (class full names are provided in the main text); box plot shows the percentage of PUP homologs in genomes of different phyla. The numbers in parentheses are the numbers of genomes. (E) Bubble plot of the percentage of PUP homologs per genome in different phyla (x axis) across continents (y axis). The size of the bubbles represents the median of the percentage of PUP homologs per genome from specific phyla. The number inside the bubbles is the size rank in each column (phylum). The numbers in parentheses of x and y labels are the numbers of genomes. (F) The percentage of genomes containing PUP homologs across seven major phyla. (G) The percentage of genomes containing PUP homologs across different continents. The numbers in all the parentheses are the numbers of genomes.
FIG 6
FIG 6
Analysis of 1,074 PGCs (2,742 PUP homologs) from 989 UHGG genomes. (A) The phylum-level taxonomic distribution and (B) the continent-level geographical distribution of 989 PGC-containing genomes. (C) Top 10 UHGG species by the percentage of PGC genes per genome; species names are followed by the number of genomes in parentheses. (D) Pie charts and boxplots. Pie charts show the relative fractions of different PUP classes; box plot shows the percentage of PGC genes per genome in different bacteria phyla. The numbers in parentheses are the numbers of genomes. (E) Bubble plot of the percentage of PGC genes per genome in different phyla (x axis) across continents (y axis). The size of the bubbles represents the median of the percentage of PGC genes per genome from specific phyla. The number inside the bubbles is the size rank in each column (phylum). The numbers in parentheses of x and y labels are the numbers of genomes. (F) The size distribution of PGCs. (G) The predicted substrate distribution of PGCs. PNPG, p-nitrophenyl-β-d-glucopyranoside; PNPR, p-nitrophenyl-α-l-rhamnopyranoside. (H) The percentage of genomes containing PGC genes across seven major phyla. (I) The percentage of genomes containing PGC genes across different continents. The number in all the parentheses is the number of genomes.
FIG 7
FIG 7
Overview of dbPUP website. (A) The navigation area provides quick links to different pages. (B) “Characterized” page (https://bcb.unl.edu/dbpup/characterized) of experimentally validated seed PUPs and associated metadata. (C) An example sequence similarity network (HR8 family: https://bcb.unl.edu/dbpup/network/HR8); subfamilies are shown in ovals, while nodes not in ovals are unclassified. Nodes in blue color indicate subfamilies containing seeds, while nodes in orange color indicate that there are no seed in the subfamily. A magnified view of subfamily HR8_2 is provided with nodes filled with different colors indicating their taxonomic groups at the order level. (D) Maximum-likelihood phylogeny (https://bcb.unl.edu/dbpup/tree/OR6) for Swiss-Prot homologs and seeds (red font).
FIG 8
FIG 8
Comparing the PUP abundance between microbiomes of Hadza hunter-gatherers (different seasons) and those of Americans (HMP). (A) The PUP abundance is measured by the percentage of reads mapped to 60 experimentally characterized PUPs using BLASTX. (B) The PUP abundance is measured by the RPM values calculated by mapping reads to PUP homologs, which were identified by HMMSEARCH and PSI-BLAST search against metagenome-assembled genomes (MAGs). Details can be found in Materials and Methods. **, P < 0.01 and ***, P < 0.001 as determined by t tests; n.s., not significant.

Similar articles

Cited by

References

    1. Tomás‐Barberán FA, Espín JC. 2001. Phenolic compounds and related enzymes as determinants of quality in fruits and vegetables. J Sci Food Agric 81:853–876. 10.1002/jsfa.885. - DOI
    1. Anhê FF, Choi B, Dyck J, Schertzer J, Marette A. 2019. Host–microbe interplay in the cardiometabolic benefits of dietary polyphenols. Trends Endocrinol Metab 30:384–395. 10.1016/j.tem.2019.04.002. - DOI - PubMed
    1. Espín JC, González-Sarrías A, Tomás-Barberán FA. 2017. The gut microbiota: a key factor in the therapeutic effects of (poly) phenols. Biochem Pharmacol 139:82–93. 10.1016/j.bcp.2017.04.033. - DOI - PubMed
    1. Gowd V, Karim N, Shishir MRI, Xie L, Chen W. 2019. Dietary polyphenols to combat the metabolic diseases via altering gut microbiota. Trends Food Sci Technol 93:81–93. 10.1016/j.tifs.2019.09.005. - DOI
    1. Arts IC, Hollman PC. 2005. Polyphenols and disease risk in epidemiologic studies. Am J Clin Nutr 81:317S–325S. 10.1093/ajcn/81.1.317S. - DOI - PubMed

Publication types

LinkOut - more resources