Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Mar 12:9:428.
doi: 10.3389/fmicb.2018.00428. eCollection 2018.

Pangenomic Definition of Prokaryotic Species and the Phylogenetic Structure of Prochlorococcus spp

Affiliations

Pangenomic Definition of Prokaryotic Species and the Phylogenetic Structure of Prochlorococcus spp

Mikhail A Moldovan et al. Front Microbiol. .

Abstract

The pangenome is the collection of all groups of orthologous genes (OGGs) from a set of genomes. We apply the pangenome analysis to propose a definition of prokaryotic species based on identification of lineage-specific gene sets. While being similar to the classical biological definition based on allele flow, it does not rely on DNA similarity levels and does not require analysis of homologous recombination. Hence this definition is relatively objective and independent of arbitrary thresholds. A systematic analysis of 110 accepted species with the largest numbers of sequenced strains yields results largely consistent with the existing nomenclature. However, it has revealed that abundant marine cyanobacteria Prochlorococcus marinus should be divided into two species. As a control we have confirmed the paraphyletic origin of Yersinia pseudotuberculosis (with embedded, monophyletic Y. pestis) and Burkholderia pseudomallei (with B. mallei). We also demonstrate that by our definition and in accordance with recent studies Escherichia coli and Shigella spp. are one species.

Keywords: monophyly; pangenome; paraphyly; prokaryotic species; species definition; taxonomy.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Emergence of nonhomogenous strain sets. (A,B). Two homogenous groups of six and eight strains, respectively, shown as the Venn diagrams, where the ovals represent genomes as sets of genes (OGGs). The gene frequency spectrum function G(k) is defined as the number of OGGs containing genes from exactly k genomes. (C) Nonhomogenous group of strains produced by merging two homogenous groups. The two peaks of the G(k) function correspond to genes specific for homogenous groups. (D,E) Two possible scenarios for the emergence nonhomogenous groups: divergence accompanied by independent gene gain and loss in both branches (D) or accelerated gene gain and/or loss in an internal clade (E). Green upward and red downward triangles indicate gene gain and loss, respectively.
Figure 2
Figure 2
Partition of Prochlorococcus marinus. (A) Prochlorococcus marinus pangenome spectrum function G(k). Note a large internal peak at k = 9. OGGs specific for the plus- and minus-groups are shown in pink and gray, respectively. The Venn diagram shows universal OGGs for each group and their intersections. Asterisks mark additional peaks dividing P. marinus into the high-light and low-light groups. (B) Same data for the joint pangenome of P. marinus and Synechococcus spp., the latter shown in blue. (C) Phylogenetic tree of P. marinus and Synechococcus spp. The sizes of the terminal triangles are proportional to the numbers of strains in the respective clades. Green and red triangles mark gain and loss of universal genes, respectively, their size reflects the numbers of gained and lost genes. Black dots size correlates with the number of genes lost in different clades of the Prochlorococcus marinus tree. Pink and gray backgrounds reflect partition into the high-light and low-light groups. (D) Distributions of the genome size (the number of protein-coding genes). (E) Distribution of the GC-content. Outliers in the minus-group distribution are low-light strains.
Figure 3
Figure 3
Nonhomogenous species with a monophyletic group and a paraphyletic renainder. Notation as in Figure 2. (A,D) Streptococcus equi. (B,E) Brucella. (C,F) Buchnera aphidicola.
Figure 4
Figure 4
Three paraphyletic species. (A,B) Burkholderia mallei/pseudomallei. The red triangle represents massive gene loss. (C,D) Yersinia pestis/pseudotuberculosis. (E,F) Escherichia coli and Shigella spp.
Figure 5
Figure 5
Numbers of subset-specific OGGs for different Prochlorococcus marinus partitions. The numbers for partitions consistent with the plus-group/minus-group partition are represented by pink dots, the numbers for other partitions are shown by gray dots. The gray cloud of dots between OGG numbers 100 and 250 corresponds to the high-light/low-light partition.

References

    1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410. 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Baumdicker F., Hess W. R., Pfaffelhuber P. (2012). The infinitely many genes model for the distributed genome of bacteria. Genome Biol. Evol. 4, 443–456. 10.1093/gbe/evs016 - DOI - PMC - PubMed
    1. Benson D. A., Boguski M. S., Lipman D. J., Ostell J., Ouellette B. F. (1998). GenBank. Nucleic Acids Res. 26, 1–7. 10.1093/nar/26.1.1 - DOI - PMC - PubMed
    1. Biller S. J., Berube P. M., Berta-Thompson J. W., Kelly L., Roggensack S. E., Awad L., et al. . (2014). Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus. Sci. Data 1:140034. 10.1038/sdata.2014.34 - DOI - PMC - PubMed
    1. Bobay L. M., Ochman H. (2017). Biological species are universal across Life's domains. Genome Biol. Evol. 9, 491–501. 10.1093/gbe/evx026 - DOI - PMC - PubMed