Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar 11:6:171.
doi: 10.3389/fmicb.2015.00171. eCollection 2015.

Biomarker-based classification of bacterial and fungal whole-blood infections in a genome-wide expression study

Affiliations

Biomarker-based classification of bacterial and fungal whole-blood infections in a genome-wide expression study

Andreas Dix et al. Front Microbiol. .

Abstract

Sepsis is a clinical syndrome that can be caused by bacteria or fungi. Early knowledge on the nature of the causative agent is a prerequisite for targeted anti-microbial therapy. Besides currently used detection methods like blood culture and PCR-based assays, the analysis of the transcriptional response of the host to infecting organisms holds great promise. In this study, we aim to examine the transcriptional footprint of infections caused by the bacterial pathogens Staphylococcus aureus and Escherichia coli and the fungal pathogens Candida albicans and Aspergillus fumigatus in a human whole-blood model. Moreover, we use the expression information to build a random forest classifier to classify if a sample contains a bacterial, fungal, or mock-infection. After normalizing the transcription intensities using stably expressed reference genes, we filtered the gene set for biomarkers of bacterial or fungal blood infections. This selection is based on differential expression and an additional gene relevance measure. In this way, we identified 38 biomarker genes, including IL6, SOCS3, and IRG1 which were already associated to sepsis by other studies. Using these genes, we trained the classifier and assessed its performance. It yielded a 96% accuracy (sensitivities >93%, specificities >97%) for a 10-fold stratified cross-validation and a 92% accuracy (sensitivities and specificities >83%) for an additional test dataset comprising Cryptococcus neoformans infections. Furthermore, the classifier is robust to Gaussian noise, indicating correct class predictions on datasets of new species. In conclusion, this genome-wide approach demonstrates an effective feature selection process in combination with the construction of a well-performing classification model. Further analyses of genes with pathogen-dependent expression patterns can provide insights into the systemic host responses, which may lead to new anti-microbial therapeutic advances.

Keywords: decision tree based methods; feature selection; fungal pathogens; immune response; microarray; systems biology.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow for biomarker identification, classifier construction and performance assessment.
Figure 2
Figure 2
The variable importance values were computed by the random forest algorithm. A gene with larger values exhibits a higher influence on the correct class predictions. The 50 highest importance values of the measure “mean decrease in accuracy” are shown. Genes above the dashed lines were selected as biomarkers for the corresponding classes.
Figure 3
Figure 3
Visualization of the expression patterns of the biomarker genes. The samples are clustered according to their corresponding classes. The heatmap colors correlate with the normalized expression intensities (see key on right side). The colors of the gene symbols indicate the class for which the gene was selected as biomarker (brown = fungal class, blue = bacterial class, gray = mock-infected class).
Figure 4
Figure 4
The MDS plot based on the C. neoformans dataset, where the relative positions in the plot represent the Euclidean distances of the Spearman correlations of the samples. Small distances correspond to high correlation coefficiens. Brown and gray circles indicate samples of the fungal and the mock-infected class, respectively. The arrow marks the fungal sample that was misclassified as mock-infected control.

References

    1. Ambroise C., McLachlan G. (2002). Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. U.S.A. 99, 6562–6566. 10.1073/pnas.102102699 - DOI - PMC - PubMed
    1. Ashburner M., Ball C. A., Blake J. A., Botstein D., Butler H., Cherry J. M., et al. . (2000). Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29. 10.1038/75556 - DOI - PMC - PubMed
    1. Benjamini Y., Hochberg Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soci. Ser. B 57, 289–300.
    1. Bloos F., Hinder F., Becker K., Sachse S., Mekontso Dessap A., Straube E., et al. . (2010). A multicenter trial to compare blood culture with polymerase chain reaction in severe human sepsis. Intensive Care Med. 36, 241–247. 10.1007/s00134-009-1705-z - DOI - PubMed
    1. Breiman L. (2001). Random forests. Mach. Learn. 45, 5–32 10.1023/A:1010933404324 - DOI