Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2022 Feb 24;12(1):3189.
doi: 10.1038/s41598-022-07185-5.

Comparative pangenome analysis of capsulated Haemophilus influenzae serotype f highlights their high genomic stability

Affiliations
Comparative Study

Comparative pangenome analysis of capsulated Haemophilus influenzae serotype f highlights their high genomic stability

Aida Gonzalez-Diaz et al. Sci Rep. .

Abstract

Haemophilus influenzae is an opportunistic pathogen adapted to the human respiratory tract. Non-typeable H. influenzae are highly heterogeneous, but few studies have analysed the genomic variability of capsulated strains. This study aims to examine the genetic diversity of 37 serotype f isolates from the Netherlands, Portugal, and Spain, and to compare all capsulated genomes available on public databases. Serotype f isolates belonged to CC124 and shared few single nucleotide polymorphisms (SNPs) (n = 10,999), but a high core genome (> 80%). Three main clades were identified by the presence of 75, 60 and 41 exclusive genes for each clade, respectively. Multi-locus sequence type analysis of all capsulated genomes revealed a reduced number of clonal complexes associated with each serotype. Pangenome analysis showed a large pool of genes (n = 6360), many of which were accessory genome (n = 5323). Phylogenetic analysis revealed that serotypes a, b, and f had greater diversity. The total number of SNPs in serotype f was significantly lower than in serotypes a, b, and e (p < 0.0001), indicating low variability within the serotype f clonal complexes. Capsulated H. influenzae are genetically homogeneous, with few lineages in each serotype. Serotype f has high genetic stability regardless of time and country of isolation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Pangenomic analysis of the 37 H. influenzae serotype f genomes. (A) Core-SNP phylogenetic tree, demographic data (country and invasiveness), genes detected, and the assigned allele. Clades I, II and III are indicated by coloured dots. The percentage of strains carrying each gene is presented graphically. (B) Distribution of genes detected in H. influenzae serotype f: core genes (100% of genomes), soft-core genes (95–99%), shell genes (15–94%), and cloud genes (< 15%). Core genes were classified as monoallelic (same allele in all the isolates) or clade segregating alleles (an allelic variant exclusive to one clade). The pie charts show the identity and number of SNPs for alleles of each clade in relation to the alleles of the same gene in other clades.
Figure 2
Figure 2
Pangenomic analysis of capsulated H. influenzae. (A) Gene pool of capsulated H. influenzae genomes included in this study. The number of core, soft-core, shell, cloud, and total genes of each serotype was determined using Roary, with a minimum identity percentage of 70% for BLASTp and the -cd parameter adjusted to 100. (B) Relative pangenome composition represented as a percentage of genes per genome of each serotype. Gene pool was defined as the set of all genes in a population. Donut charts indicate the distribution of core (100% of genomes), soft-core (95–99%), shell (15–94%), and cloud genes (< 15%). (C) Correlation between total and core genes in all capsulated H. influenzae genomes from this study and from the NCBI and ENA databases by serotype.
Figure 3
Figure 3
Core genome SNP typing of capsulated H. influenzae genomes. Each dot reflects the number of SNPs found in serotype a, b, c, d, e, and f genomes compared to the reference genomes NML-Hia-1 [CC23] (NZ_CP017811.1), 10810 [CC6] (NC_016809.1), M12125 [CC7] (SRR9847495), PTHi-10983 [CC10] (ERR2560729), M15895 [CC18] (NZ_CP031249.1), and KR494 [CC124] (NC_022356.1), respectively. Split violin plots show the distribution of the genomes based on the number of SNPs by each serotype.

References

    1. Nørskov-Lauritsen N. Classification, identification, and clinical significance of Haemophilus and Aggregatibacter species with host specificity for humans. Clin. Microbiol. Rev. 2014;27:214–240. - PMC - PubMed
    1. Slack MPE. A review of the role of Haemophilus influenzae in community-acquired pneumonia. Pneumonia. 2015;6:26–43. - PMC - PubMed
    1. Watts SC, Holt KE. hicap: In silico serotyping of the Haemophilus influenzae capsule locus. J. Clin. Microbiol. 2019;57:e00190–e219. - PMC - PubMed
    1. Le P, Nghiem VT, Swint JM. Post-GAVI sustainability of the Haemophilus influenzae type b vaccine program: The potential role of economic evaluation. Hum. Vaccines Immunother. 2016;12:2403–2405. - PMC - PubMed
    1. Cerquetti M, Giufrè M. Why we need a vaccine for non-typeable Haemophilus influenzae. Hum. Vaccines Immunother. 2016;12:2357–2361. - PMC - PubMed

Publication types