Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 14:7:28.
doi: 10.3389/fcimb.2017.00028. eCollection 2017.

In silico Comparison of 19 Porphyromonas gingivalis Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics

Affiliations

In silico Comparison of 19 Porphyromonas gingivalis Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics

Tsute Chen et al. Front Cell Infect Microbiol. .

Abstract

Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica. All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/.

Keywords: Porphyromonas gingivalis; comparative genomics; phylogenetics; phylogenomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Phylogenetic tree of P. gingivalis 16S rRNA gene sequences. A total of 24 unique 16S rRNA gene sequences were extracted from the genomes of 19 P. gingivalis strains annotated by NCBI. Sequences were pre-aligned with MAFFT v6.935b (2012/08/21) (Katoh and Standley, 2013) and leading and trailing sequences not present in all sequences were trimmed. The trimmed aligned sequences represent 20 unique sequences and were subject to QuickTree V 1.1 (Howe et al., 2002) using the “-kimura” option to calculate the substitution rate. Sequence of P. asaccharolytica strain DSM 20707 (PaDSM20707) was used as out-group. The branch length of the out-group was truncated to fit the tree in the figure and the substitution rate is indicated with the blue number. The red numbers next to the branching point are the bootstrap values based on 100 iterations. Sequences of different strains were separated by semicolons and the number of sequences were indicated in the parentheses in the format of (x–y/z), where x and y are the start and end IDs and z the total number in the strain.
Figure 2
Figure 2
Core and unique genes in P. gingivalis surveyed by sequence identity and alignment length. Of the 39,926 NCBI annotated P. gingivalis proteins, 37,667 are ≥ 50 amino acids in length and were searched for homologous clusters using the “blastclust” software V.2.2.25 (http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html). Various sequence identity cutoffs ranging from 10 to 95% and two minimal alignment length cutoffs 50 and 90% were used as the program parameters to identify the protein clusters in the three categories (A) clusters containing proteins from all 19 genomes; (B) clusters containing proteins from 2 to 18 genomes; and (C) clusters with protein from only 1 genome.
Figure 3
Figure 3
Unique proteins in 19 P. gingivalis strains. Of the 39,926 NCBI annotated P. gingivalis proteins, 37,667 are ≥ 50 amino acids in length and were searched for homologous clusters using the “blastclust” software V.2.2.25 (http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html). Unique proteins of each of the 19 P. gingivalis genomes were identified as proteins found in only one genome without any similar counterpart in any other. The total number of clusters that contain only unique proteins for each genome were plotted. Various sequence identity cutoffs ranging from 10 to 95% (dots with varying grayscale color intensity) and two minimal alignment length cutoffs 50% (A) and 90% (B) were used as the program parameters.
Figure 4
Figure 4
P. gingivalis phylogenomic trees based on core proteins identified at various percent sequence identities. Of the 39,926 NCBI annotated P. gingivalis proteins, 37,667 are ≥ 50 amino acids in length and were searched for homologous clusters using the “blastclust” software V.2.2.25 (http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html). (A) unrooted tree based on the 1,045 shared proteins identified by “blastclust” with 60% as the sequence identity and 90% as the alignment length cutoffs; the alignment generated a total of 17,389 effective (non-identical) protein sequence positions across all 19 genomes and the tree was constructed based on these positions; (B) rooted tree based on 436 proteins (out of 1,045) that are also found in P. asaccharolytica strain DSM 20707 (PaDSM20707) with ≥ 50% sequence identity and ≥ 90% alignment length; the alignment generated 4,771 effective protein sequence positions; (C) rooted tree based on 36 proteins shared among 20 genomes with ≥ 80% sequence identity and ≥ 90% alignment length. Proteins were aligned with MAFFT v6.935b (2012/08/21) (Katoh and Standley, 2013) and poorly aligned regions were filtered by Gblocks 0.91b (Talavera and Castresana, 2007). Trees were constructed with FastTree 2.1.9 (Price et al., 2010) using the JTT protein mutation model (Jones et al., 1992) and CAT+–gemma options to account for the different rates of evolution at different sites. The reliability of tree splits were reported as “local support values” based on Shimodaira-Hasegawa test (Shimodaira and Hasegawa, 2001) and are printed in blue on the split. The branch length (substitution rate) of the outgroup PaDSM20707 was truncated and the length were printed in black (B,C); (D) Rooted tree constructed using PhyloPhlAn (Segata et al., 2013) by directly subjecting all NCBI annotated proteins of the 20 genomes to the software, resulting in 840 effective protein positions from 225 aligned proteins.
Figure 5
Figure 5
DNA-DNA sequence alignment between P. gingivalis genomes. Genomic sequence alignment between several pairs of P. gingivalis strains were plotted using NUCmer (NUCleotide MUMmer) version 3.1 (Delcher et al., 2002). The sequence percent identities of detected homologous fragments were plotted in gradient colors based on the percentage. The axes are the nucleotide coordination in the genomes. The orders of the contigs in the unfinished genomes were rearranged based on the reference genome (genome on X- axis). (A) strain 381 vs. ATCC 33277; (B) HG66 vs. ATCC 33277; (C) strain 381 vs. HG66; (D) W50 vs. W83; (E) A7436 vs. W83; (F) AJW4 vs. A7436; (G) TDC60 vs. JCVI SC001; and (H) TDC60 vs. JCVI SC001 showing only the region with percent identity ≥ 99%.
Figure 6
Figure 6
Genomic DNA similarity of 19 P. gingivalis genomes compared by oligonucleotide frequency. All possible 20-mer sequences present in all genomes, including that of P. asaccharolytica strain DSM 20707 (PaDSM2070) used as an out-group, were categorized and the number of genomes in which a 20-mer is present, was recorded. (A) was generated by first calculating the average number of genomes for all the 20 mers present in every 500-nucleotide windows across the entire genome and then color each window based on the genome frequency (minimum 1 in yellow and maximum 20 in black). (B) was similar to (A) but the non-coding regions were masked with light blue color to highlight the oligonucleotide frequencies for the areas that correspond to both forward (upper) and reverse-complement (lower) protein coding sequences. The order of the unfinished genomic contigs was arranged in the same order as appeared in the sequences downloaded from NCBI. The genomes in the plot were ordered based on the 16S rRNA phylogenetic tree (Figure 1) with a dendrogram derived from the same tree to show the relatedness.

References

    1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. . (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Aziz R. K., Bartels D., Best A. A., DeJongh M., Disz T., Edwards R. A., et al. . (2008). The RAST Server: rapid annotations using subsystems technology. BMC Genomics 9:75. 10.1186/1471-2164-9-75 - DOI - PMC - PubMed
    1. Aziz R. K., Devoid S., Disz T., Edwards R. A., Henry C. S., Olsen G. J., et al. . (2012). SEED servers: high-performance access to the SEED genomes, annotations, and metabolic models. PLoS ONE 7:e48053. 10.1371/journal.pone.0048053 - DOI - PMC - PubMed
    1. Brunner J., Wittink F. R., Jonker M. J., de Jong M., Breit T. M., Laine M. L., et al. . (2010). The core genome of the anaerobic oral pathogenic bacterium Porphyromonas gingivalis. BMC Microbiol. 10:252. 10.1186/1471-2180-10-252 - DOI - PMC - PubMed
    1. Chastain-Gross R. P., Xie G., Bélanger M., Kumar D., Whitlock J. A., Liu L., et al. . (2015). Genome sequence of Porphyromonas gingivalis strain A7436. Genome Announc 3:e00927-15. 10.1128/genomeA.00927-15 - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources