Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 10;15(1):29253.
doi: 10.1038/s41598-025-14841-z.

A systematic algorithm using 16S ribosomal RNA for accurate diagnosis of pneumonia pathogens

Affiliations

A systematic algorithm using 16S ribosomal RNA for accurate diagnosis of pneumonia pathogens

Ferry Dwi Kurniawan et al. Sci Rep. .

Abstract

Over half of community-acquired pneumonia cases are caused by a few dozen bacterial species, and accurate identification of these pathogens is essential for effective treatment. In this study, we developed a reliable diagnostic method using 16S ribosomal RNA (16S rRNA) sequencing, considering intra-species variation, the need to differentiate Streptococcus pneumoniae from oral α-hemolytic streptococci, and applicability to the battlefield hypothesis, which helps distinguish true pathogens from commensal organisms that are not causative pathogens. We designed specific primers and a BLAST wrapper program, Cheryblast + ob, to classify 37 pneumonia-causing bacteria and 4 α-hemolytic streptococci. In simulation experiments involving a total of 20,309 copies of the 16S rRNA from 41 species of bacteria deposited in Genbank, the algorithm achieved a sensitivity greater than 0.996 and a specificity of 1.000. It was robust against sequencing errors and successfully distinguished S. pneumoniae from closely related species. In an experiment using next-generation sequencing on artificial mixtures containing genomic DNA from 10 bacterial species and human DNA at varying two-fold ratios, the species with the highest copy number was correctly identified in 8 out of 11 samples, and the top two species by copy number were identified in all 11 samples. This high-performance method offers a promising tool for accurate pneumonia diagnosis and could also be applied to other infections in which a limited number of bacterial species must be reliably identified.

Keywords: Streptococcus pneumoniae; 16S ribosomal RNA; Battlefield hypothesis; Next-generation sequencing; Pneumonia.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Analysis flowchart. A flowchart from the DNA extraction to the application of the battlefield hypothesis. Each bacterium is differentiated using the bit score from NCBI BLAST. Streptococci and Mycobacteria are further differentiated using species-specific sequences. In this study, we optimized our program using 16 S rRNA sequences downloaded from the NCBI website and simulated the post-DNA extraction steps using bacterial DNA purchased from ATCC or commercially available human genomic DNA.
Fig. 2
Fig. 2
16S rRNA homology. The fraction of the most frequent nucleotide at each position after aligning the consensus sequences of 41 bacterial species’ 16S rRNA using Clustal Omega. In regions one nucleotide is conserved across all bacteria, the fraction of the most frequent nucleotide is 1.0. If there are no gaps and the four nucleotides A, G, C, and T are present in equal proportions, the fraction of the most frequent nucleotide is 0.25. In positions where gaps occur in many bacteria, the fraction of the most frequent nucleotide becomes less than 0.25. Below the scale, the commonly used V1–V2 and V3–V4 regions are indicated, along with the positions of representative primers that amplify each region.
Fig. 3
Fig. 3
Schematic diagram of the distribution of bit scores. (A) When 16 S rRNA sequences from various bacteria are used as queries and the 16 S rRNA consensus sequence of E. coli is used as the subject in a BLAST comparison. (B) When S. pneumoniae is used as the subject.
Fig. 4
Fig. 4
The impact of sequencing errors on 16 S rRNA classification. (A) Sensitivity. (B) Specificity.
Fig. 5
Fig. 5
A simulated reaction using DNA mixtures mimicking airway secretions. The genomic DNA from 65,000 copies of the target species and genomic DNA from 32,500 copies, 16,250 copies, 8,125 copies, 4,063 copies, and 2,032 copies of other species (competitors) were mixed, amplified, sequenced, and classified using the method described in this report. The graph shows the proportion of the next-generation sequencer reads attributed to the target and each competitor. We consider that, in the cases of K. pneumoniae, M. tuberculosis, and S. aureus, the comparable number of reads between the target and competitor is likely due to a slight difference in PCR amplification efficiency, with the competitor being amplified more efficiently. We uploaded the FASTQ files obtained from the simulated reactions to 10.5281/zenodo.14759736.

Similar articles

References

    1. File, T. M. Community-acquired pneumonia. Lancet362, 1991–2001 (2003). - PMC - PubMed
    1. American Thoracic Society; Infectious Diseases Society of America. Guidelines for the management of adults with hospital-acquired, ventilator-associated, and healthcare-associated pneumonia. Am. J. Respir. Crit. Care Med.171, 388–416 (2005). - PubMed
    1. Levison, M. E. & Pneumonia Including Necrotizing Pulmonary Infections (Lung Abscess). In: Harrison’s Principles of Internal Medicine, 15th Edition. McGraw-Hill, 1457–1464. (2001).
    1. Hirama, T. et al. Prediction of the pathogens that are the cause of pneumonia by the battlefield hypothesis. PLoS One6, e24474 (2011). - PMC - PubMed
    1. Hirama, T. et al. HIRA-TAN: a real-time PCR-based system for the rapid identification of causative agents in pneumonia. Respir. Med.108, 395–404 (2014). - PubMed

MeSH terms