Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 17;24(1):53.
doi: 10.1186/s12859-023-05144-z.

Pathogen detection in RNA-seq data with Pathonoia

Affiliations

Pathogen detection in RNA-seq data with Pathonoia

Anna-Maria Liebhoff et al. BMC Bioinformatics. .

Abstract

Background: Bacterial and viral infections may cause or exacerbate various human diseases and to detect microbes in tissue, one method of choice is RNA sequencing. The detection of specific microbes using RNA sequencing offers good sensitivity and specificity, but untargeted approaches suffer from high false positive rates and a lack of sensitivity for lowly abundant organisms.

Results: We introduce Pathonoia, an algorithm that detects viruses and bacteria in RNA sequencing data with high precision and recall. Pathonoia first applies an established k-mer based method for species identification and then aggregates this evidence over all reads in a sample. In addition, we provide an easy-to-use analysis framework that highlights potential microbe-host interactions by correlating the microbial to the host gene expression. Pathonoia outperforms state-of-the-art methods in microbial detection specificity, both on in silico and real datasets.

Conclusion: Two case studies in human liver and brain show how Pathonoia can support novel hypotheses on microbial infection exacerbating disease. The Python package for Pathonoia sample analysis and a guided analysis Jupyter notebook for bulk RNAseq datasets are available on GitHub.

Keywords: Metagenomics; Pathogen detection; RNA sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Pathonoia toolkit. A The algorithm analyzes unaligned RNA-seq reads, based on Kraken 2. Kraken generates k-mer assignments and a taxonomic classification for each read (grey box). Pathonoia uses all k-mer assignments of a sample and combines them into a non-read-count based abundance metric AO. B Pathonoia and the downstream analysis template are available on GitHub. C The analysis workflow for a dataset. A transcriptome alignment yields gene counts and unaligned reads which are analyzed by Pathonoia (A). A differential abundance analysis reports organisms that are more frequent in one sample group compared to another (examples in Fig. 3B, F). An “organism of interest” (OoI) can be selected for understanding its role in a sample group. Samples with (AOoI>0) and without (AOoI=0) the OoI are compared in a differential gene expression analysis using the gene counts. A gene set enrichment analysis of de-regulated genes may uncover the pathways affected by the OoI
Fig. 2
Fig. 2
Pathonoia reduces number of false positives (FP) in noisy metagenomic samples. A The spectrum of species is shown, as reported by Kraken 2 and Pathonoia for a cell line sample infected with Human Herpes Virus (HHV). The top 10 most abundant species are highlighted. Kraken 2 reported 7262 organisms of which 250 are shown that have >100 reads. Pathonoia lists 132 organisms and Herpes viruses ascend in the ranking of reported species. B Number of reported species in two datasets (12 and 24 samples) by Kraken 2, Pathonoia and Kraken with threshold (organism detected if >100 reads counted). A lower number of detected organisms is desirable since it reduces the number of FP. C Pathonoia aims to improve the precision of detected organisms in a sample. FP (sequencing errors, other sample bias or random alignments, especially with poor quality reads) should be removed. D Average precision, recall and F1 for a simulated dataset, evaluated for Kraken 2-based algorithms and Centrifuge. Recall is the highest in Kraken 2 and Centrifuge. With removing FP from the Kraken results, every algorithm also loses some TP (recall goes down). E Number of species detected in simulated dataset. High recall in D is explainable by the high number of species that each algorithm finds
Fig. 3
Fig. 3
Case studies: analyzing datasets with Pathonoia. AD Fronto Temporal Dementia. A The dataset contains 30 cases of FTD (sub-groups shown in C) and 15 controls. Pathonoia reported 431 organisms over all samples. B The volcano plot shows 12 differentially abundant organisms, ten of them up-regulated in FTD samples. The color scale shows the number of samples containing the organism. C B. stabilis was chosen as OoI. AO is given across samples. D Three gene sets from a differential expression analysis between patients with and without B. stabilis (34 up-regulated genes, 109 down-regulated genes, in total 143) were compared in an over-representation analysis with gene sets related to Molecular Functions and Biological Processes. (By B. stabilis) up-regulated genes hint towards an immune reaction in the FTD patients. The Biological Processes relate to neural pathways. EF Fibrosis in Liver Diseases. E A dataset with 51 human liver samples from patients with different liver diseases and fibrosis levels comprises 653 reported species by Pathonoia. F A differential abundance analysis of samples with and without fibrosis lead to 41 organisms of which only one was up-regulated in two non-fibrotic samples. Seven organisms were present in more than nine fibrotic samples

References

    1. Castillo DJ, Rifkin RF, Cowan DA, Potgieter M. The healthy human blood microbiome: Fact or fiction? Front Cell Infect Microbiol. 2019;9:148. doi: 10.3389/fcimb.2019.00148. - DOI - PMC - PubMed
    1. Martí JM. Recentrifuge: robust comparative analysis and contamination removal for metagenomics. PLOS Comput Biol. 2019;15:1–24. doi: 10.1371/journal.pcbi.1006967. - DOI - PMC - PubMed
    1. Roberts R, Farmer C, Walker C. The human brain microbiome; there are bacteria in our brains. In: Conference report at the society for neuroscience meeting. San Diego, CA; 2018
    1. Link CD. Is there a brain microbiome? Neurosci Insights. 2021;16:26331055211018709. doi: 10.1177/26331055211018709. - DOI - PMC - PubMed
    1. Westermann AJ, Gorski SA, Vogel J. Dual RNA-seq of pathogen and host. Nat Rev Microbiol. 2012;10(9):618–630. doi: 10.1038/nrmicro2852. - DOI - PubMed

LinkOut - more resources