Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb 25:8:e42535.
doi: 10.7554/eLife.42535.

An updated phylogeny of the Alphaproteobacteria reveals that the parasitic Rickettsiales and Holosporales have independent origins

Affiliations

An updated phylogeny of the Alphaproteobacteria reveals that the parasitic Rickettsiales and Holosporales have independent origins

Sergio A Muñoz-Gómez et al. Elife. .

Abstract

The Alphaproteobacteria is an extraordinarily diverse and ancient group of bacteria. Previous attempts to infer its deep phylogeny have been plagued with methodological artefacts. To overcome this, we analyzed a dataset of 200 single-copy and conserved genes and employed diverse strategies to reduce compositional artefacts. Such strategies include using novel dataset-specific profile mixture models and recoding schemes, and removing sites, genes and taxa that are compositionally biased. We show that the Rickettsiales and Holosporales (both groups of intracellular parasites of eukaryotes) are not sisters to each other, but instead, the Holosporales has a derived position within the Rhodospirillales. A synthesis of our results also leads to an updated proposal for the higher-level taxonomy of the Alphaproteobacteria. Our robust consensus phylogeny will serve as a framework for future studies that aim to place mitochondria, and novel environmental diversity, within the Alphaproteobacteria.

Keywords: Azospirillaceae; Finniella inopinata; Holosporaceae; Holosporales; Peranema; Rhodospirillales; Rhodovibriaceae; Stachyamoba; evolutionary biology; infectious disease; microbiology; mitochondria.

PubMed Disclaimer

Conflict of interest statement

SM, SH, GB, BL, ES, CS, AR No competing interests declared

Figures

Figure 1.
Figure 1.. Compositional heterogeneity in the Alphaproteobacteria is a major factor that confounds phylogenetic inference.
There are great disparities in the genome G + C% content and amino acid compositions of the Rickettsiales, Pelagibacterales (including alphaproteobacterium HIMB59) and Holosporales with all other alphaproteobacteria. (A) A UPGMA (average-linkage) clustering of amino acid compositions (based on the 200 gene set for the Alphaproteobacteria) shows that the Rickettsiales (brown), Pelagibacterales (maroon), and Holosporales (light blue) all have very similar proteome amino acid compositions. At the tips of the tree, GARP:FIMNKY amino acid ratio values are shown as bars. (B) A scatterplot depicting the strong correlation between G + C% (nucleotide compositions) and GARP:FIMNKY ratios (amino acid composition) for the 120 taxa in the Alphaproteobacteria (and outgroup) shows a similar clustering of the Rickettsiales, Pelagibacterales (including alphaproteobacterium HIMB59) and Holosporales.
Figure 2.
Figure 2.. Decreasing compositional heterogeneity by removing compositionally biased sites disrupts the clustering of the Rickettsiales, Pelagibacterales (including alphaprotobacterium HIMB59) and Holosporales.
All branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. (A) A maximum-likelihood tree inferred under the LG + PMSF(ES60)+F + R6 model and from the untreated dataset which is highly compositionally heterogeneous. The three long-branched orders, the Rickettsiales, Pelagibacterales (including alphaprotobacterium HIMB59) and Holosporales, that have similar amino acid compositions form a clade. (B) A maximum-likelihood tree inferred under the LG + PMSF(ES60)+F + R6 model and from a dataset whose compositional heterogeneity has been decreased by removing 50% of the most biased sites according to ɀ. In this phylogeny, the clustering of the Rickettsiales, Pelagibacterales and Holosporales is disrupted. The Pelagibacterales is sister to the Rhodobacterales, Caulobacterales and Rhizobiales. The Holosporales, and alphaproteobacterium HIMB59, become sister to the Rhodospirillales. The Rickettsiales remains as the sister to the Caulobacteridae. See Figure 2—figure supplement 1 for taxon names. See Figure 2—figure supplement 3 for the Bayesian consensus trees inferred in PhyloBayes MPI v1.7 under the CAT-Poisson+Γ4 model. See also Figure 2—figure supplements 2 and 4–7.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. A labeled version showing taxon names for Figure 2.
Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. A diagram of the strategies and phylogenetic analyses employed in this study.
Figure 2—figure supplement 3.
Figure 2—figure supplement 3.. Bayesian consensus trees inferred with PhyloBayes MPI v1.7 and the CAT-Poisson+Γ4 model.
Branch support values are 1.0 posterior probabilities unless annotated. (A) Bayesian consensus tree inferred from the full dataset which is highly compositionally heterogeneous. (B) Bayesian consensus tree inferred from a dataset whose compositional heterogeneity has been decreased by removing 50% of the most biased sites according to ɀ. See Figure 2A and B for the most likely trees inferred in IQ-TREE v1.5.5 and the LG + PMSF(C60)+F + R6 model.
Figure 2—figure supplement 4.
Figure 2—figure supplement 4.. Maximum-likelihood trees to assess the placements of the Holosporales, Rickettsiales, Pelagibacterales and alphaproteobacterium HIMB59 when all four groups are included.
Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. (A) A tree that results from the analysis of the untreated dataset. (B) A tree that results from the analysis of a dataset from which the 50% most compositionally biased sites have been removed. (C) A tree that results from the analysis of a dataset that has been recoded into the four-character state recoding scheme S4 (recoding scheme: RNCM EHIPTWV ADQLKS GFY). (D) A tree that results from the analysis of a dataset that only comprises the 40 most compositionally homogeneous genes.
Figure 2—figure supplement 5.
Figure 2—figure supplement 5.. Maximum-likelihood tree from the untreated dataset from which no taxon has been removed and analyzed under simpler LG4X model.
In this tree, derived from an analysis using a model that does not account for compositional heterogeneity across sites, the Geminicoccaceae has a more derived placements within the Rhodospirillales as sister to the Acetobacteraceae.
Figure 2—figure supplement 6.
Figure 2—figure supplement 6.. Constraint tree, used for IQ-TREE analyses, labeled with taxon names and also degree of missing data per taxon.
Magnetococcales in gray; Rickettsiales in brown; Pelagibacterales in maroon; Holosporales in light blue; Rhizobiales in green; Caulobacterales in orange; Rhodobacterales in red; Sneathiellales in pink; Rhodospirillales in purple; Beta- and Gammaproteobacteria in black.
Figure 2—figure supplement 7.
Figure 2—figure supplement 7.. GARP:FIMNKY ratios across the proteomes of the 120 alphaproteobacteria and outgroup used in this study.
Figure 3.
Figure 3.. The Holosporales (renamed and lowered in rank to the Holosporaceae family here) branches in a derived position within the Rhodospirillales when compositional heterogeneity is reduced and the long-branched and compositionally biased Rickettsiales, Pelagibacterales, and alphaproteobacterium HIMB59 are removed.
Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. (A) A maximum-likelihood tree, inferred under the LG + PMSF(ES60)+F + R6 model, to place the Holosporaceae in the absence of the Rickettsiales, Pelagibacterales, and alphaproteobacterium HIMB59 and when compositional heterogeneity has been decreased by removing 50% of the most biased sites. The Holosporaceae is sister to the Azospirillaceae fam. nov. within the Rhodospirillales. (B) A maximum-likelihood tree, inferred under the GTR + ES60 S4+F + R6 model, to place the Holosporaceae in the absence of the Rickettsiales, Pelagibacterales, and alphaproteobacterium HIMB59, and when the data have been recoded into a four-character state alphabet (the dataset-specific recoding scheme S4: ARNDQEILKSTV GHY CMFP W) to reduce compositional heterogeneity. This phylogeny shows a pattern that matches that inferred when compositional heterogeneity has been alleviated through site removal. See Figure 3—figure supplement 6 for the Bayesian consensus trees inferred in PhyloBayes MPI v1.7 and under the and the CAT-Poisson+Γ4 model. See also Figure 3—figure supplements 1–5 and 7–8.
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Maximum-likelihood trees to assess the placement of the Holosporales in the absence of the Rickettsiales, Pelagibacterales and alphaproteobacterium HIMB59.
Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. (A) A tree that results from the analysis of the untreated dataset. () A tree that results from the analysis of a dataset from which the 50% most compositionally biased sites have been removed. (C) A tree that results from the analysis of a dataset that has been recoded into the four-character state recoding scheme S4 (recoding scheme: ARNDQEILKSTV GHY CMFP W). (D) A tree that results from the analysis of a dataset that only comprises the 40 most compositionally homogeneous genes.
Figure 3—figure supplement 2.
Figure 3—figure supplement 2.. Maximum-likelihood trees to assess the placement of the Rickettsiales in the absence of the Holosporales, Pelagibacterales, and alphaproteobacterium HIMB59.
Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. (A) A tree that results from the analysis of the untreated dataset. (B) A tree that results from the analysis of a dataset from which the 50% most compositionally biased sites have been removed. (C) A tree that results from the analysis of a dataset that has been recoded into the four-character state recoding scheme S4 (recoding scheme: PY RNMF GHLKTW ADCQEISV). (D) A tree that results from the analysis of a dataset that only comprises the 40 most compositionally homogeneous genes.
Figure 3—figure supplement 3.
Figure 3—figure supplement 3.. Maximum-likelihood trees to assess the placement of the Rickettsiales in the absence of the Holosporales, Pelagibacterales, alphaproteobacterium HIMB59 and the Beta-, and Gammaproteobacteria outgroup.
Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. (A) A tree that results from the analysis of the untreated dataset. (B) A tree that results from the analysis of a dataset from which the 50% most compositionally biased sites have been removed. (C) A tree that results from the analysis of a dataset that has been recoded into the four-character state recoding scheme S4 (recoding scheme: RNMF GHLKTW ADCQEISV PY). (D) A tree that results from the analysis of a dataset that only comprises the 40 most compositionally homogeneous genes. Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated.
Figure 3—figure supplement 4.
Figure 3—figure supplement 4.. Maximum-likelihood trees to assess the placement of the Pelagibacterales in the absence of the Holosporales, Rickettsiales and alphaproteobacterium HIMB59.
Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated. (A) A tree that results from the analysis of the untreated dataset. (B) A tree that results from the analysis of a dataset from which the 50% most compositionally biased sites have been removed. (C) A tree that results from the analysis of a dataset that has been recoded into the four-character state recoding scheme S4 (recoding scheme: EGIV ARNDQHKMPSY LFT CW). (D) A tree that results from the analysis of a dataset that only comprises the 40 most compositionally homogeneous genes.
Figure 3—figure supplement 5.
Figure 3—figure supplement 5.. Maximum-likelihood trees to assess the placement of alphaproteobacterium HIMB59 in the absence of the Holosporales, Rickettsiales and Pelagibacterales.
Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated.( A) A tree that results from the analysis of the untreated dataset. (B) A tree that results from the analysis of a dataset from which the 50% most compositionally biased sites have been removed. (C) A tree that results from the analysis of a dataset that has been recoded into the four-character state recoding scheme S4 (recoding scheme: RLKMT ANDQEIPSV CW GHFY). (D) A tree that results from the analysis of a dataset that only comprises the 40 most compositionally homogeneous genes.
Figure 3—figure supplement 6.
Figure 3—figure supplement 6.. Bayesian consensus trees inferred with PhyloBayes MPI v1.7 and the CAT-Poisson+Γ4 model.
Branch support values are 1.0 posterior probabilities unless annotated. (A) Bayesian consensus tree inferred to place the Holosporales in the absence of the Rickettsiales and the Pelagibacterales and when compositional heterogeneity has been decreased by removing 50% of the most biased sites according to ɀ. (B) Bayesian consensus tree inferred to place the Holosporales in the absence of the Rickettsiales and the Pelagibacterales and when the data have been recoded into a four-character state alphabet (the dataset-specific recoding scheme S4: ARNDQEILKSTV GHY CMFP W) to reduce compositional heterogeneity. See Figure 2A and B for the most likely trees inferred in IQ-TREE v1.5.5 and the LG + PMSF(C60)+F + R6 and GTR + ES60 S4+F + R6 models, respectively.
Figure 3—figure supplement 7.
Figure 3—figure supplement 7.. Maximum-likelihood trees to assess the placement of the Holosporales when the fast-evolving Holospora and ‘Candidatus Hepatobacter’ are also included in the absence of the Rickettsiales, Pelagibacterales and alphaproteobacterium HIMB59.
Branch support values are 100% SH-aLRT and 100% UFBoot unless annotated.
Figure 3—figure supplement 8.
Figure 3—figure supplement 8.. Bayesian consensus tree inferred to place the Holosporales in the absence of the Pelagibacterales, alphaproteobacterium HIMB59, and Rickettsiales, and when the data have been recoded into a six-character state alphabet (the dataset-specific recoding scheme S6: AQEHISV RKMT PY DCLF NG W) to reduce compositional heterogeneity.
Branch support values are 1.0 posterior probabilities unless annotated.

References

    1. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. - DOI - PMC - PubMed
    1. Bazylinski DA, Williams TJ, Lefèvre CT, Trubitsyn D, Fang J, Beveridge TJ, Moskowitz BM, Ward B, Schübbe S, Dubbels BL, Simpson B. Magnetovibrio blakemorei gen. nov., sp. nov., a magnetotactic bacterium (Alphaproteobacteria: rhodospirillaceae) isolated from a salt marsh. International Journal of Systematic and Evolutionary Microbiology. 2013;63:1824–1833. doi: 10.1099/ijs.0.044453-0. - DOI - PubMed
    1. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. - DOI - PMC - PubMed
    1. Boscaro V, Fokin SI, Schrallhammer M, Schweikert M, Petroni G. Revised systematics of Holospora-like bacteria and characterization of "Candidatus Gortzia infectiva", a novel macronuclear symbiont of Paramecium jenningsi. Microbial Ecology. 2013;65:255–267. doi: 10.1007/s00248-012-0110-2. - DOI - PubMed
    1. Brindefalk B, Ettema TJ, Viklund J, Thollesson M, Andersson SG. A phylometagenomic exploration of oceanic alphaproteobacteria reveals mitochondrial relatives unrelated to the SAR11 clade. PLOS ONE. 2011;6:e24457. doi: 10.1371/journal.pone.0024457. - DOI - PMC - PubMed

Publication types

LinkOut - more resources