Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 23;5(6):e11267.
doi: 10.1371/journal.pone.0011267.

Classification of dengue fever patients based on gene expression data using support vector machines

Affiliations

Classification of dengue fever patients based on gene expression data using support vector machines

Ana Lisa V Gomes et al. PLoS One. .

Abstract

Background: Symptomatic infection by dengue virus (DENV) can range from dengue fever (DF) to dengue haemorrhagic fever (DHF), however, the determinants of DF or DHF progression are not completely understood. It is hypothesised that host innate immune response factors are involved in modulating the disease outcome and the expression levels of genes involved in this response could be used as early prognostic markers for disease severity.

Methodology/principal findings: mRNA expression levels of genes involved in DENV innate immune responses were measured using quantitative real time PCR (qPCR). Here, we present a novel application of the support vector machines (SVM) algorithm to analyze the expression pattern of 12 genes in peripheral blood mononuclear cells (PBMCs) of 28 dengue patients (13 DHF and 15 DF) during acute viral infection. The SVM model was trained using gene expression data of these genes and achieved the highest accuracy of approximately 85% with leave-one-out cross-validation. Through selective removal of gene expression data from the SVM model, we have identified seven genes (MYD88, TLR7, TLR3, MDA5, IRF3, IFN-alpha and CLEC5A) that may be central in differentiating DF patients from DHF, with MYD88 and TLR7 observed to be the most important. Though the individual removal of expression data of five other genes had no impact on the overall accuracy, a significant combined role was observed when the SVM model of the two main genes (MYD88 and TLR7) was re-trained to include the five genes, increasing the overall accuracy to approximately 96%.

Conclusions/significance: Here, we present a novel use of the SVM algorithm to classify DF and DHF patients, as well as to elucidate the significance of the various genes involved. It was observed that seven genes are critical in classifying DF and DHF patients: TLR3, MDA5, IRF3, IFN-alpha, CLEC5A, and the two most important MYD88 and TLR7. While these preliminary results are promising, further experimental investigation is necessary to validate their specific roles in dengue disease.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. A schematic map depicting the interactions of the 12 proteins/genes studied herein, known or indicated to be relevant to the viral innate immune response pathway, including for dengue.
Figure 2
Figure 2. Heatmap for gene expression data of the 12 genes (columns) studied from the 28 patients (rows).
The first 15 are DF patients, while the rest are DHF patients. The DF/ND and DHF/ND gene expression values from qPCR were used to create the heatmap. The colour shades are associated with the values in the cells: green for ratio of DF/ND and DHF/ND of <1 (down-regulated) and red for DF/ND and DHF/ND ratio of > = 1 (up-regulated). The gene expression data for IFN-β of one of the patients (23) was not available and therefore the vector attributes of this gene for the patient were represented as blank.
Figure 3
Figure 3. SVM optimization.
Optimization of the parameters C and γ of the SVM kernel RBF: only C values of 0.01, 0.10, 1.0, 10.0 and 100.0, and γ value of 1.0 are shown.
Figure 4
Figure 4. Influence of each gene to the accuracy of the baseline SVM model.
The first bar represents the baseline accuracy of all the 12 genes (TLR3, TLR7, TLR9, MDA5, MYD88, RIGI, IRF3, IRF7, IFN-α, IFN-β, IFN-γ, and CLEC5A). The subsequent bars represent accuracy of datasets with only 11 genes, whereby vector attributes of one gene were removed at a time (the name of the gene removed is indicated). The last bar, SVM model refers to the seven genes (MYD88, TLR3, TLR7, MDA5, IRF3, IFN-α and CLEC5A) that returned optimum accuracy. The RBF kernel function of SVM with optimum parameter settings (C = 10 and γ = 1.0) were used for model building of each situation. * represents p<0.05 value compared with the 12 baseline gene set (the bar labelled “All 12 genes”).

Similar articles

Cited by

References

    1. Holmes EC, Burch SS. The causes and consequences of genetic variation in dengue virus. Trends Microbiol. 2000;8:74–77. - PubMed
    1. Coffey LL, Mertens E, Brehin AC, Fernandez-Garcia MD, Amara A, et al. Human genetic determinants of dengue virus susceptibility. Microbes Infect. 2009;11:143–156. - PubMed
    1. Halstead SB. Dengue. Lancet. 2007;370:1644–1652. - PubMed
    1. WHO. 1997. Haemorrhagic Fever: Diagnosis, Treatment, Prevention and Control, second ed.
    1. Ubol S, Masrinoul P, Chaijaruwanich J, Kalayanarooj S, Charoensirisuthikul T, et al. Differences in global gene expression in peripheral blood mononuclear cells indicate a significant role of the innate responses in progression of dengue fever but not dengue hemorrhagic fever. J Infect Dis. 2008;197:1459–1467. - PubMed

Publication types