Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 13;10(1):4173.
doi: 10.1038/s41467-019-12171-z.

The distinction of CPR bacteria from other bacteria based on protein family content

Affiliations

The distinction of CPR bacteria from other bacteria based on protein family content

Raphaël Méheust et al. Nat Commun. .

Abstract

Candidate phyla radiation (CPR) bacteria separate phylogenetically from other bacteria, but the organismal distribution of their protein families remains unclear. Here, we leveraged sequences from thousands of uncultivated organisms and identified protein families that co-occur in genomes, thus are likely foundational for lineage capacities. Protein family presence/absence patterns cluster CPR bacteria together, and away from all other bacteria and archaea, partly due to proteins without recognizable homology to proteins in other bacteria. Some are likely involved in cell-cell interactions and potentially important for episymbiotic lifestyles. The diversity of protein family combinations in CPR may exceed that of all other bacteria. Over the bacterial tree, protein family presence/absence patterns broadly recapitulate phylogenetic structure, suggesting persistence of core sets of proteins since lineage divergence. The CPR could have arisen in an episode of dramatic but heterogeneous genome reduction or from a protogenote community and co-evolved with other bacteria.

PubMed Disclaimer

Conflict of interest statement

JFB is a founder of Metagenomi. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Schematic tree illustrating the phylogenetic sampling used in this study. Lineages that were included in the datasets are highlighted with a dot. Lineages lacking an isolated representative are highlighted with red dots. The number of genomes per lineages is available in Supplementary Fig. 2. The diagram is based on a tree published recently in ref.
Fig. 2
Fig. 2
The distribution of the 22,977 families across the 2890 genomes. a The distribution of the 22,977 families (columns) across the 2890 prokaryotic genomes (rows). Data are clustered based on the presence (black)/absence (white) profiles (Jaccard distance, hierarchical clustering using a complete linkage). Archaea: blue, CPR: red, non-CPR bacteria: gray. The patterns in orange correspond to the presence/absence patterns of 921 widespread families. b The phyla distribution of the 235 modules of proteins in CPR (y axis) and non-CPR bacteria (x axis). Each dot corresponds to a module. The orange dots correspond to the four widespread modules on which further analyses focus
Fig. 3
Fig. 3
The distribution of 921 widely distributed protein families across the 2890 genomes. a The distribution of 921 widely distributed protein families (columns) in 2890 genomes (rows) from CPR bacteria (red), non-CPR bacteria (gray), and a few archaea (light blue) in a reference set with extensive sampling of genomes from metagenomes (thus including sequences from many candidate phyla). Data are clustered based on the presence (black)/absence (white) profiles (Jaccard distance, complete linkage). Only draft-quality and non-redundant genomes were used. The colored top bar corresponds to the functional category of families (Metabolism: red, Genetic Information Processing: blue, Cellular Processes: green, Environmental Information Processing: yellow, Organismal systems: orange, Unclassified: gray, Unknown: white). b Tree resulting from the hierarchical clustering of the genomes based on the distributions of proteins families in panel a. c A phylogenetic tree of the CPR genomes present in the dataset. The maximum-likelihood tree was calculated based on the concatenation of 14 ribosomal proteins (L2, L3, L4, L5, L6, L14, L15, L18, L22, L24, S3, S8, S17, and S19) using the PROTCATLG model
Fig. 4
Fig. 4
The distribution of protein families across representative genomes of the prokaryotic tree of life. a The distribution of 921 widely distributed protein families (columns) in 2616 draft-quality and non-redundant genomes (rows) from a reference set with extensive sampling of genomes from non-CPR bacteria. Genomes are clustered based on the presence (black)/absence (white) profiles (Jaccard distance, complete linkage). The order of the families is the same as in Fig. 3a. b Tree resulting from the hierarchical clustering of the genomes based on the distributions of proteins families in panel a. c The same tree with a collapsing of all branches that represented <25% of the maximum branch length (CPR are in red, Archaea in blue and non-CPR bacteria in gray)
Fig. 5
Fig. 5
Example of proteins that are depleted or enriched in CPR. In the left panel are represented four proteins (colored in red) that are involved in informational machineries and are depleted in CPR, yet widespread and important in non-CPR bacteria. In the right panel is represented a schematic model for the type IV pili and the competence systems that appear widespread in CPR bacteria
Fig. 6
Fig. 6
Distribution of the 106 families that are enriched in CPR relative to non-CPR bacteria. The distribution of 106 protein families enriched in CPR (columns) in 2890 draft-quality and non-redundant genomes (rows) from a reference set with extensive sampling of genomes from metagenomes. The order of the families and the genomes is the same as in Fig. 3A
Fig. 7
Fig. 7
Two scenarios for the origin and the evolution of the CPR. a In the first scenario, CPR and non-CPR bacteria emerged from the protogenote community and co-evolved. In this case, major divergences within the CPR, essentially the rise of new CPR phyla, may have been stimulated by evolutionary innovations that generated new lineages of potential bacterial hosts. b In the second scenario, CPR evolved from within the non-CPR bacteria and experienced a huge genome reduction

References

    1. Castelle CJ, Banfield JF. Major new microbial groups expand diversity and alter our understanding of the tree of life. Cell. 2018;172:1181–1197. doi: 10.1016/j.cell.2018.02.016. - DOI - PubMed
    1. Wrighton KC, et al. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science. 2012;337:1661–1665. doi: 10.1126/science.1224041. - DOI - PubMed
    1. Rinke C, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013;499:431–437. doi: 10.1038/nature12352. - DOI - PubMed
    1. Luef B, et al. Diverse uncultivated ultra-small bacterial cells in groundwater. Nat. Commun. 2015;6:6372. doi: 10.1038/ncomms7372. - DOI - PubMed
    1. Brown CT, et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature. 2015;523:208–211. doi: 10.1038/nature14486. - DOI - PubMed

Publication types

Substances