. 2018 Mar 9;14(3):e1007239.

doi: 10.1371/journal.pgen.1007239. eCollection 2018 Mar.

Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes

Christian Beck¹, Henning Knoop¹, Ralf Steuer¹

Affiliations

PMID: 29522508
PMCID: PMC5862535
DOI: 10.1371/journal.pgen.1007239

Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes

Christian Beck et al. PLoS Genet. 2018.

. 2018 Mar 9;14(3):e1007239.

doi: 10.1371/journal.pgen.1007239. eCollection 2018 Mar.

Authors

Christian Beck¹, Henning Knoop¹, Ralf Steuer¹

Affiliation

¹ Humboldt-Universität zu Berlin, Institut für Theoretische Biologie (ITB), Berlin, Germany.

PMID: 29522508
PMCID: PMC5862535
DOI: 10.1371/journal.pgen.1007239

Abstract

Cyanobacteria are a monophyletic phylogenetic group of global importance and have received considerable attention as potential host organisms for the renewable synthesis of chemical bulk products from atmospheric CO2. The cyanobacterial phylum exhibits enormous metabolic diversity with respect to morphology, lifestyle and habitat. As yet, however, research has mostly focused on few model strains and cyanobacterial diversity is insufficiently understood. In this respect, the increasing availability of fully sequenced bacterial genomes opens new and unprecedented opportunities to investigate the genetic inventory of organisms in the context of their pan-genome. Here, we seek understand cyanobacterial diversity using a comparative genome analysis of 77 fully sequenced and assembled cyanobacterial genomes. We use phylogenetic profiling to analyze the co-occurrence of clusters of likely ortholog genes (CLOGs) and reveal novel functional associations between CLOGs that are not captured by co-localization of genes. Going beyond pair-wise co-occurrences, we propose a network approach that allows us to identify modules of co-occurring CLOGs. The extracted modules exhibit a high degree of functional coherence and reveal known as well as previously unknown functional associations. We argue that the high functional coherence observed for the modules is a consequence of the similar-yet-diverse nature of cyanobacteria. Our approach highlights the importance of a multi-strain analysis to understand gene functions and environmental adaptations, with implications beyond the cyanobacterial phylum. The analysis is augmented with a simple toolbox that facilitates further analysis to investigate the co-occurrence neighborhood of specific CLOGs of interest.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. The cyanobacterial core and pan-genome.**
(A) The distribution of CLOGs as a function of the number of assigned strains. (B) The size of the pan-genome estimated for an increasing number of strains. The blue line indicates the mean size of the pan-genome, error bars indicate the standard deviation of 10⁴ randomly sampled subsets of strains. The red line shows a least squares fit of the power law p ∼ N^g (Heaps’ law), with p denoting the size of pan-genome and N the number of genomes. The estimated exponent g = 0.62 indicates an open pan-genome. (C) The size of the cyanobacterial core-genome estimated for an increasing number of strains. The blue line indicates the mean size of the core-genome whereas error bars indicate the standard deviation of 10⁴ randomly sampled subsets of strains. The estimates of pan- and core-genome do not include genomes of *E. coli* and *Cyanobacterium* UCYN-A.

**Fig 2. Network analysis of co-occurring CLOGs.**
(A) Orthologous genes are identified using an all-against-all BLASTp comparison and are grouped into cluster of likely orthologous genes (CLOGs). CLOGs are classified into three sets: core CLOGs (present in all strains), shared CLOGs (present in several but not all strains) and unique CLOGs (present in a single strain). (B) The phylogenetic profile of each CLOG indicates the set of strains whose genome is annotated with genes corresponding to the CLOG. Pair-wise co-occurrence of CLOGs is identified using the similarity of phylogenetic profiles. CLOGs are grouped into modules of co-occurring CLOGs using a community-detection algorithm. (C) A network view on co-occurring CLOGs. We identify a total of 563 modules with 1930 CLOGs. Circular genome maps were constructed using the CiVi tool [60].

**Fig 3. Genomic proximity of co-occurring CLOGs.**
The average adjacency score (aAS) measures the co-localization of CLOGs grouped into co-occurring modules. (A) A histogram of the average adjacency score (aAS). The histogram shows a clear dichotomy between modules whose constituent CLOGs (and hence genes) are co-localized in all genomes (aAS ≈ 1) and modules whose genes are not co-localized (aAS ≈ 0). (B) A scatter plot between the similarity score, measuring the quality of co-occurrence, and the aAS. The plot indicates that there is a positive but weak correlation between the genomic proximity of the genes comprising a module (represented by the aAS) and the quality of co-occurrence. The straight line corresponds to a linear regression and serves as a guide to the eye. (C) A scatterplot between the number of CLOGs associated to module and the aAS. While larger modules tend to have a lower aAS, the aAS scores are relatively well distributed with respect to the number of CLOGs in a module. (D) A scatterplot between the number of strains associated to a a module and the aAS. The number aAS is again relatively well distributed with respect to number of participating strains. In both plots the straight line indicates a linear regression and serves as a guide to the eye.

**Fig 4. Selected modules of co-occurring CLOGSs and their associated strains.**
A black box indicates if a CLOG (y-axis) is associated with a specific strain (x-axis). The first column indicates the module number, the last column indicates the primary annotation of the respective CLOG. Shown is an excerpt of modules of co-occurring CLOGs.

See this image and copyright information in PMC

References

1. Ducat DC, Way JC, Silver PA. Engineering cyanobacteria to generate high-value products. Trends Biotechnol. 2011;29(2):95–103. doi: 10.1016/j.tibtech.2010.12.003 - DOI - PubMed
1. Calteau A, Fewer DP, Latifi A, Coursin T, Laurent T, Jokela J, et al. Phylum-wide comparative genomics unravel the diversity of secondary metabolism in Cyanobacteria. BMC Genomics. 2014;15:977 doi: 10.1186/1471-2164-15-977 - DOI - PMC - PubMed
1. Savakis P, Hellingwerf KJ. Engineering cyanobacteria for direct biofuel production from CO2. Curr Opin Biotechnol. 2015;33:8–14. doi: 10.1016/j.copbio.2014.09.007 - DOI - PubMed
1. Shih PM, Wu D, Latifi A, Axen SD, Fewer DP, Talla E, et al. Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. Proc Natl Acad Sci U S A. 2013;110(3):1053–8. doi: 10.1073/pnas.1217107110 - DOI - PMC - PubMed
1. Fujisawa T, Narikawa R, Maeda SI, Watanabe S, Kanesaki Y, Kobayashi K, et al. CyanoBase: a large-scale update on its 20th anniversary. Nucleic Acids Res. 2017;45(D1):D551–D554. doi: 10.1093/nar/gkw1131 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes

Affiliation

Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials