Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 9;14(3):e1007239.
doi: 10.1371/journal.pgen.1007239. eCollection 2018 Mar.

Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes

Affiliations

Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes

Christian Beck et al. PLoS Genet. .

Abstract

Cyanobacteria are a monophyletic phylogenetic group of global importance and have received considerable attention as potential host organisms for the renewable synthesis of chemical bulk products from atmospheric CO2. The cyanobacterial phylum exhibits enormous metabolic diversity with respect to morphology, lifestyle and habitat. As yet, however, research has mostly focused on few model strains and cyanobacterial diversity is insufficiently understood. In this respect, the increasing availability of fully sequenced bacterial genomes opens new and unprecedented opportunities to investigate the genetic inventory of organisms in the context of their pan-genome. Here, we seek understand cyanobacterial diversity using a comparative genome analysis of 77 fully sequenced and assembled cyanobacterial genomes. We use phylogenetic profiling to analyze the co-occurrence of clusters of likely ortholog genes (CLOGs) and reveal novel functional associations between CLOGs that are not captured by co-localization of genes. Going beyond pair-wise co-occurrences, we propose a network approach that allows us to identify modules of co-occurring CLOGs. The extracted modules exhibit a high degree of functional coherence and reveal known as well as previously unknown functional associations. We argue that the high functional coherence observed for the modules is a consequence of the similar-yet-diverse nature of cyanobacteria. Our approach highlights the importance of a multi-strain analysis to understand gene functions and environmental adaptations, with implications beyond the cyanobacterial phylum. The analysis is augmented with a simple toolbox that facilitates further analysis to investigate the co-occurrence neighborhood of specific CLOGs of interest.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The cyanobacterial core and pan-genome.
(A) The distribution of CLOGs as a function of the number of assigned strains. (B) The size of the pan-genome estimated for an increasing number of strains. The blue line indicates the mean size of the pan-genome, error bars indicate the standard deviation of 104 randomly sampled subsets of strains. The red line shows a least squares fit of the power law pNg (Heaps’ law), with p denoting the size of pan-genome and N the number of genomes. The estimated exponent g = 0.62 indicates an open pan-genome. (C) The size of the cyanobacterial core-genome estimated for an increasing number of strains. The blue line indicates the mean size of the core-genome whereas error bars indicate the standard deviation of 104 randomly sampled subsets of strains. The estimates of pan- and core-genome do not include genomes of E. coli and Cyanobacterium UCYN-A.
Fig 2
Fig 2. Network analysis of co-occurring CLOGs.
(A) Orthologous genes are identified using an all-against-all BLASTp comparison and are grouped into cluster of likely orthologous genes (CLOGs). CLOGs are classified into three sets: core CLOGs (present in all strains), shared CLOGs (present in several but not all strains) and unique CLOGs (present in a single strain). (B) The phylogenetic profile of each CLOG indicates the set of strains whose genome is annotated with genes corresponding to the CLOG. Pair-wise co-occurrence of CLOGs is identified using the similarity of phylogenetic profiles. CLOGs are grouped into modules of co-occurring CLOGs using a community-detection algorithm. (C) A network view on co-occurring CLOGs. We identify a total of 563 modules with 1930 CLOGs. Circular genome maps were constructed using the CiVi tool [60].
Fig 3
Fig 3. Genomic proximity of co-occurring CLOGs.
The average adjacency score (aAS) measures the co-localization of CLOGs grouped into co-occurring modules. (A) A histogram of the average adjacency score (aAS). The histogram shows a clear dichotomy between modules whose constituent CLOGs (and hence genes) are co-localized in all genomes (aAS ≈ 1) and modules whose genes are not co-localized (aAS ≈ 0). (B) A scatter plot between the similarity score, measuring the quality of co-occurrence, and the aAS. The plot indicates that there is a positive but weak correlation between the genomic proximity of the genes comprising a module (represented by the aAS) and the quality of co-occurrence. The straight line corresponds to a linear regression and serves as a guide to the eye. (C) A scatterplot between the number of CLOGs associated to module and the aAS. While larger modules tend to have a lower aAS, the aAS scores are relatively well distributed with respect to the number of CLOGs in a module. (D) A scatterplot between the number of strains associated to a a module and the aAS. The number aAS is again relatively well distributed with respect to number of participating strains. In both plots the straight line indicates a linear regression and serves as a guide to the eye.
Fig 4
Fig 4. Selected modules of co-occurring CLOGSs and their associated strains.
A black box indicates if a CLOG (y-axis) is associated with a specific strain (x-axis). The first column indicates the module number, the last column indicates the primary annotation of the respective CLOG. Shown is an excerpt of modules of co-occurring CLOGs.

References

    1. Ducat DC, Way JC, Silver PA. Engineering cyanobacteria to generate high-value products. Trends Biotechnol. 2011;29(2):95–103. doi: 10.1016/j.tibtech.2010.12.003 - DOI - PubMed
    1. Calteau A, Fewer DP, Latifi A, Coursin T, Laurent T, Jokela J, et al. Phylum-wide comparative genomics unravel the diversity of secondary metabolism in Cyanobacteria. BMC Genomics. 2014;15:977 doi: 10.1186/1471-2164-15-977 - DOI - PMC - PubMed
    1. Savakis P, Hellingwerf KJ. Engineering cyanobacteria for direct biofuel production from CO2. Curr Opin Biotechnol. 2015;33:8–14. doi: 10.1016/j.copbio.2014.09.007 - DOI - PubMed
    1. Shih PM, Wu D, Latifi A, Axen SD, Fewer DP, Talla E, et al. Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. Proc Natl Acad Sci U S A. 2013;110(3):1053–8. doi: 10.1073/pnas.1217107110 - DOI - PMC - PubMed
    1. Fujisawa T, Narikawa R, Maeda SI, Watanabe S, Kanesaki Y, Kobayashi K, et al. CyanoBase: a large-scale update on its 20th anniversary. Nucleic Acids Res. 2017;45(D1):D551–D554. doi: 10.1093/nar/gkw1131 - DOI - PMC - PubMed

Publication types

Substances