Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;5(4):621-45.
doi: 10.1093/gbe/evt036.

Bacterial DNA sifted from the Trichoplax adhaerens (Animalia: Placozoa) genome project reveals a putative rickettsial endosymbiont

Affiliations

Bacterial DNA sifted from the Trichoplax adhaerens (Animalia: Placozoa) genome project reveals a putative rickettsial endosymbiont

Timothy Driscoll et al. Genome Biol Evol. 2013.

Abstract

Eukaryotic genome sequencing projects often yield bacterial DNA sequences, data typically considered as microbial contamination. However, these sequences may also indicate either symbiont genes or lateral gene transfer (LGT) to host genomes. These bacterial sequences can provide clues about eukaryote-microbe interactions. Here, we used the genome of the primitive animal Trichoplax adhaerens (Metazoa: Placozoa), which is known to harbor an uncharacterized Gram-negative endosymbiont, to search for the presence of bacterial DNA sequences. Bioinformatic and phylogenomic analyses of extracted data from the genome assembly (181 bacterial coding sequences [CDS]) and trace read archive (16S rDNA) revealed a dominant proteobacterial profile strongly skewed to Rickettsiales (Alphaproteobacteria) genomes. By way of phylogenetic analysis of 16S rDNA and 113 proteins conserved across proteobacterial genomes, as well as identification of 27 rickettsial signature genes, we propose a Rickettsiales endosymbiont of T. adhaerens (RETA). The majority (93%) of the identified bacterial CDS belongs to small scaffolds containing prokaryotic-like genes; however, 12 CDS were identified on large scaffolds comprised of eukaryotic-like genes, suggesting that T. adhaerens might have recently acquired bacterial genes. These putative LGTs may coincide with the placozoan's aquatic niche and symbiosis with RETA. This work underscores the rich, and relatively untapped, resource of eukaryotic genome projects for harboring data pertinent to host-microbial interactions. The nature of unknown (or poorly characterized) bacterial species may only emerge via analysis of host genome sequencing projects, particularly if these species are resistant to cell culturing, as are many obligate intracellular microbes. Our work provides methodological insight for such an approach.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.—
Fig. 1.—
Overview of the methodology used to identify bacterial DNA sequences within the Trichoplax adhaerens genome project. Bacterial SSU rDNA sequences were mined from the trace read archive (left), with rickettsial sequences further analyzed via phylogeny estimation. Bacterial CDS identified in the assembly (right) were determined to be primarily rickettsial-like based on phylogeny estimation and identification of rickettsial signature genes. The distribution of bacterial CDS on small (bacterial-like) and large (eukaryotic-like) scaffolds is shown, with CDS on the latter further evaluated for LGT via phylogeny estimation.
F<sc>ig</sc>. 2.—
Fig. 2.—
Identification of bacterial DNA sequences within the Trichoplax adhaerens genome trace read archive and assembly. (a) Illustration of 289 SSU rDNA sequences identified in the T. adhaerens trace read archive (http://genome.jgi-psf.org/Triad1/Triad1.download.ftp.html, last accessed March 2013). The pie chart at top left illustrates the 34 prokaryotic 16S rDNA sequences detected among 255 T. adhaerens 18S rDNA sequences. Larger graph at right illustrates the taxonomic distribution of the 34 prokaryotic 16S rDNA sequences (see text for details on taxonomic assignment). Sequences are grouped into nested sectors according to hierarchical taxonomy, progressing from the interior to exterior of the plot. Color scheme is explained in box at bottom left. Cyanobacterial hits correspond to chloroplast rDNA sequences of cyanobacterial origin. Plot made with Krona v.2.0 (Ondov et al. 2011) with manual adjustment. (b) Illustration of T. adhaerens proteins that have strong similarity to their prokaryotic counterparts. All proteins encoded within the T. adhaerens assembly (NCBI, ASM15027v1, n = 11,540) were used as queries in BLASTP searches against prokaryotic and eukaryotic proteins within the nr database (NCBI), with subjects ranked according to Sm score (see text for details). Graph depicts the taxonomic distribution of the top five scores per T. adhaerens protein that included a prokaryotic protein (n = 1,697). The taxa are arrayed along the x axis in decreasing order according to the number of top hits (blue). Prokaryotic groups with less than 15 total hits per group (sum 1–5) are not shown. Asterisks depict taxonomic groups that also have a 16S rDNA sequence illustrated in panel a.
F<sc>ig</sc>. 3.—
Fig. 3.—
Phylogeny of SSU rDNA sequences estimated for 78 Rickettsiales taxa, 10 mitochondria, and 5 outgroup taxa. See text for alignment and tree-building methods. Tree is final optimization likelihood: (−22042.321923) using GTR substitution model with GAMMA and proportion of invariant sites estimated. Brach support is from 1,000 bootstrap pseudoreplications. For nodes represented by 2 bootstrap values, the left is from the analysis that included 10 mitochondrial sequences, with the right from the analysis without the mitochondrial sequences. All nodes with single bootstrap values had similar support in both analyses. Red (mitochondria) and orange (within Rickettsiaceae) branches are reduced 75% and increased 50%, respectively. Blue cladograms depict minimally resolved lineages within the “Midichloriaceae.” The divergence point of five outgroup taxa from Betaproteobacteria (n = 1), Gammaproteobacteria (n = 1), and other Alphaproteobacteria (n = 3) is shown with a dashed branch. For each taxon, associated hosts are within parentheses, with ES depicting an environmental sample. Other abbreviations: UB, uncultured bacterium; UP, uncultured proteobacterium; UA, uncultured alphaproteobacterium; URB, uncultured Rickettsiales bacterium. Taxa within black boxes have available genome sequence data. The 16S rDNA sequence mined from the T. adhaerens trace archive is boxed green and noted with a red star. Accession numbers for all sequences are provided in supplementary table S1, Supplementary Material online.
F<sc>ig</sc>. 4.—
Fig. 4.—
Bacterial CDS identified within the Trichoplax adhaerens genome assembly. (a) Results of an all-against-all BLASTP analysis between the genomes of T. adhaerens Grell-BS-1999 (n = 11,540) and “Candidatus Midichloria mitochondrii” str. IricVA (n = 1,211), hereafter M. mitochondrii. Outer black circle is a scale with coordinates (in Mb) for the M. mitochondrii genome, with the putative origin of replication positioned at 12 o’clock as previously determined (Sassera et al. 2011). Four rings inside the scale as follows: 1) 1,211 CDS of the M. mitochondrii genome, with operons and transcriptional units (predicted using fgenesb (Tyson et al. 2004)) colored green and gray, respectively; 2) heat maps for Sm scores >20 (outer) and corresponding E values (inner) for 347 T. adhaerens-M. mitochondrii protein matches, with Sm scores from 20 (dark blue) to 576 (burgundy) and E values from 1 (dark blue) to 1.00E−500 (burgundy); 3) histograms depicting the number of contigs on each scaffold that contain the identified T. adhaerens gene: eukaryotic-like CDS (black), RETA CDS of core data set (orange), RETA CDS of accessory data set (blue); 4) all 347 T. adhaerens CDS (outer, black) and 138 RETA CDS (inner, orange, blue) (supplementary fig. S3, Supplementary Material online, for linear histogram and further information). NOTE: five yellow CDS (outer) were below the Sm 20 cutoff but were determined to be RETA CDS via manual inspection. RETA CDS present on the same T. adhaerens scaffold are linked in the interior of the plot, with boxes (1–7) depicting syntenic regions across M. mitochondrii and RETA. Plot made using Circos (Krzywinski et al. 2009) with manual adjustment. (b) List of 181 RETA CDS identified within the T. adhaerens assembly. RETA identifier (0001–0181) followed by gene symbol or predicted product description (complete annotations in supplementary table S2, Supplementary Material online). Core data set CDS (orange) comprise 119 ORFs corresponding to 116 genes, with three split genes (dashed boxes). Accessory data set CDS (blue) comprise 62 genes. Black circles depict RETA CDS with homologs present in the M. mitochondrii genome (n = 138), and are listed according to their clockwise arrangement in ring 4 of the plot in (a). Yellow circles depict the five genes added manually (Sm < 20) to ring 4. Green boxes enclose the seven syntenic regions illustrated in the interior on the plot. Open circles depict the 42 RETA CDS that do not have significant homologs in the M. mitochondrii genome. Red asterisks denote six CDS that were subsequently determined to be likely nuclear encoded mitochondrial genes (see text). Red asterisks also mark the location of these CDS in (a) between rings 2 and 3.
F<sc>ig</sc>. 5.—
Fig. 5.—
Genome-based phylogeny estimated for RETA, 162 alphaproteobacterial taxa, 12 mitochondria, and 2 outgroup taxa. RETA core proteins (n = 113) were included in the phylogenetic pipeline that entails ortholog group (OG) generation, OG alignment (and masking of less conserved positions), and concatenation of aligned OGs (see text). Tree was estimated using the CAT-GTR model of substitution as implemented in PhyloBayes v3.3 (Lartillot and Philippe 2004, 2006). Tree is a consensus of 1,522 trees (post burn-in) pooled from two independent Markov chains run in parallel. Branch support was measured via posterior probabilities, which reflect frequencies of clades among the pooled trees. RETA is boxed green and noted with a red star. Classification scheme for Rickettsia spp. follows previous studies (Gillespie et al. 2007, 2008). Taxon names, PATRIC genome IDs (bacteria) and NCBI accession numbers (mitochondria) for the 176 genomes are provided in supplementary table S3, Supplementary Material online.
F<sc>ig</sc>. 6.—
Fig. 6.—
Bacterial CDS (accessory data set) identified within the Trichoplax adhaerens genome assembly. These 62 CDS were determined to lack the profile of typical Rickettsiales genes inherited vertically from an alphaproteobacterial ancestor (see text). RETA CDS are plotted by %GC (x axis). Trichoplax adhaerens protein accession numbers (NCBI), RETA IDs and gene/protein names are listed on the y axis, with color scheme as follows: green, highly similar to Rickettsiales signature proteins (trees shown in supplementary fig. S6, Supplementary Material online); blue, present in some (or all) Rickettsiales genomes yet divergent in sequence and phylogenetic signal (trees shown in supplementary fig. S7, Supplementary Material online); red, unknown from Rickettsiales genomes. Inset shows the average %GC for all 62 CDS, as well as for the three groups. Stars depict the following: yellow, identical to sequences from the genome of Halothermothrix orenii H 168 (Firmicutes: Haloanaerobiales); orange, 99% aa identity with sequences from the genome of Alteromonas macleodii ATCC 27126 (Gammaproteobacteria: Alteromonadales); green, most similar to chloroplast sequences of haptophytic algae (Eukaryota; Haptophyceae). Colored boxes on the y-axis correspond with stars on the plot, with the gray box illustrating a fused gene model (trmH-fkpA).
F<sc>ig</sc>. 7.—
Fig. 7.—
Evidence for bacterial-like genes encoded in the Trichoplax adhaerens genome. (a) Division of the 181 RETA CDS into four categories based on the composition of their scaffolds: eukaryotic scaffolds, CDS present on large (>40 genes) scaffolds with predominately eukaryotic-like genes (n = 18); small hybrid scaffolds, CDS present on small (<7 genes) scaffolds with both bacterial- and eukaryotic-like genes (n = 19); all-bacteria scaffolds, CDS present on small (<5 genes) scaffolds comprised entirely of bacterial-like genes (n = 59); and singleton-gene scaffolds (n = 85). Each category is further divided into single exon genes and genes possessing one or more introns (as predicted within the original T. adhaerens assembly). (b) Eight large, eukaryotic-like T. adhaerens scaffolds contain 18 RETA CDS. Scaffold IDs and number of encoded genes are from the T. adhaerens assembly (see text). RETA IDs and protein names are further described in supplementary table S2, Supplementary Material online. For core data set CDS: CD-R, CD-B, and CD-E correspond to the sub-data sets Ric-78, Bac-26, and Euk-9, respectively (supplementary fig. S2, Supplementary Material online). For accessory data set CDS: AD-R, highly similar to Rickettsiales signature proteins; AD-B, present in some (or all) Rickettsiales genomes yet divergent in sequence and phylogenetic signal (fig. 6). The number of exons for each CDS is shown. The results of gene predictions by fgenesb (Tyson et al. 2004) (headings for three columns colored green) are described as follows: “Coverage (T. adhaerens),” percentage of bps in the eukaryotic gene prediction matching those in the fgenesb prediction; “Discrepancy,” differences at either the N- or C-terminus across eukaryotic and fgenesb predictions; “Coverage (RETA),” percentage of bps in the fgenesb prediction matching those in the eukaryotic gene prediction. The most related sequences as determined by phylogeny estimation are listed, with letters referring to individual phylogeny estimations (supplementary fig. S8, Supplementary Material online). Potential bacteria-to-T. adhaerens LGT products are highlighted in yellow.

References

    1. Acuna R, et al. Adaptive horizontal transfer of a bacterial gene to an invasive insect pest of coffee. Proc Natl Acad Sci U S A. 2012;109:4197–4202. - PMC - PubMed
    1. Aikawa T, et al. Longicorn beetle that vectors pinewood nematode carries many Wolbachia genes on an autosome. Proc Biol Sci. 2009;276:3791–3798. - PMC - PubMed
    1. Aziz RK, et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008;9:75. - PMC - PubMed
    1. Baldridge GD, et al. Wide dispersal and possible multiple origins of low-copy-number plasmids in Rickettsia species associated with blood-feeding arthropods. Appl Environ Microbiol. 2010;76:1718–1731. - PMC - PubMed
    1. Beninati T, et al. A novel alpha-Proteobacterium resides in the mitochondria of ovarian cells of the tick Ixodes ricinus. Appl Environ Microbiol. 2004;70:2596–2602. - PMC - PubMed

Publication types

MeSH terms