Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 5;19(1):e0296283.
doi: 10.1371/journal.pone.0296283. eCollection 2024.

Two-step light gradient boosted model to identify human west nile virus infection risk factor in Chicago

Affiliations

Two-step light gradient boosted model to identify human west nile virus infection risk factor in Chicago

Guangya Wan et al. PLoS One. .

Abstract

West Nile virus (WNV), a flavivirus transmitted by mosquito bites, causes primarily mild symptoms but can also be fatal. Therefore, predicting and controlling the spread of West Nile virus is essential for public health in endemic areas. We hypothesized that socioeconomic factors may influence human risk from WNV. We analyzed a list of weather, land use, mosquito surveillance, and socioeconomic variables for predicting WNV cases in 1-km hexagonal grids across the Chicago metropolitan area. We used a two-stage lightGBM approach to perform the analysis and found that hexagons with incomes above and below the median are influenced by the same top characteristics. We found that weather factors and mosquito infection rates were the strongest common factors. Land use and socioeconomic variables had relatively small contributions in predicting WNV cases. The Light GBM handles unbalanced data sets well and provides meaningful predictions of the risk of epidemic disease outbreaks.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. -log(p) of Kolmogorov-Smirnov test for all the features and covariates.
From the KS test, we calculate the p-value, which indicates how different the distribution of the variable is between the hexagon-weeks with and without a case. The larger the -log(p), the less similar the two distributions are. The variables are grouped into four main categories. Blue bars represent the land cover variables. Orange bars represent the mosquito infection rates. Green bars represent the weather variables. Red bars represent the demographic variables.
Fig 2
Fig 2. Heat-map covariance matrix for all the features.
Original data are from Karki (2020) [26]. Yellow colors indicate strong positive correlations; dark blue colors indicate strong negative correlations. Light blue or green colors indicate weak correlations. We infer that temperature has a relatively high temporal correlation, as the variables tempc and templag1-4 (current temperature and temperatures 1–4 weeks before) are correlated. In addition, development stage and housing age are correlated with population, showing the interaction of population aggregation with land cover and housing status.
Fig 3
Fig 3. Gini feature importance of the model predicting West Nile Virus cases in the Chicago area, with the 25 variables after removing the highly correlated ones.
The higher the y-value, the more important the feature is to the model. The variables are grouped into four main categories. Blue bars represent the land cover variables. Orange bars represent the mosquito infection rates. Green bars represent the weather variables. Red bars represent the demographic variables. We found that total population is the most important variable in the model. The weather and MIRs are also strong predictors.
Fig 4
Fig 4. Gini Feature importance of the candidate predictors in the reduced model.
The variables are grouped into four main categories. Blue bars represent the land cover variables. Orange bars represent the mosquito infection rates. Green bars represent the weather variables. Red bars represent the demographic variables. The demographic features include total population, percentage of houses built after WWII and income, ranked 1, 9, and 13. dlipct is the land cover feature selected in the model, ranking 16. The average temperature 4 weeks ago and the temperature in January are the most important weather factors. MIR 1 and 4 weeks ago are the most important MIR features. While the ranks may change in individual runs, the feature importance of these factors are close to each other.
Fig 5
Fig 5. Partial dependence plot of factors with positive effects: total population, mean MIR, temperature 4 weeks before WNV cases are reported, and January temperature.
The central black line is the partial dependence line, which is the average marginal effect of each factor on the WNV cases. The green shade around it is the standard deviation of the individual conditional expectation (ICE) lines, which is the predicted marginal effect by each sample of each factor on the WNV cases. The blue shades are samples from the ICE lines, showing the range of predicted marginal effects by each individual sample. The MIR and the weekly temperatures in 1–4 weeks before also have similar trends as the mean MIR and the temperature of the current week.
Fig 6
Fig 6. Partial dependence plot of precipitation for the current week and 1–4 weeks prior.
The central black line is the partial dependence line. The green shade around it is the standard deviation of the ICE lines. The blue shades are samples from the ICE lines. Precipitation variables have non-monotonic effects.
Fig 7
Fig 7. Partial dependence plot of socioeconomics and land cover features.
The central black line is the partial dependence line. The green shade around it is the standard deviation of the ICE lines. The blue shades are samples from the ICE lines. The socioeconomics and land cover features are not very strongly represented. There is not a very strong marginal effect of income. The percentage of houses built after World War II has a slight negative effect, indicating that people living in older neighborhoods have higher WNV risks. Meanwhile, the percentage of less developed land has a slight positive effect at the lower end.

References

    1. Lanciotti RS. Origin of the West Nile Virus Responsible for an Outbreak of Encephalitis in the Northeastern United States. Science. 1999. pp. 2333–2337. doi: 10.1126/science.286.5448.2333 - DOI - PubMed
    1. Hayes EB, Komar N, Nasci RS, Montgomery SP, O’Leary DR, Campbell GL. Epidemiology and transmission dynamics of West Nile virus disease. Emerg Infect Dis. 2005;11: 1167–1173. doi: 10.3201/eid1108.050289a - DOI - PMC - PubMed
    1. Hadfield J, Brito AF, Swetnam DM, Vogels CBF, Tokarz RE, Andersen KG, et al.. Twenty years of West Nile virus spread and evolution in the Americas visualized by Nextstrain. PLoS Pathog. 2019;15: e1008042. doi: 10.1371/journal.ppat.1008042 - DOI - PMC - PubMed
    1. Kilpatrick AM, Marm Kilpatrick A, LaDeau SL, Marra PP. ECOLOGY OF WEST NILE VIRUS TRANSMISSION AND ITS IMPACT ON BIRDS IN THE WESTERN HEMISPHERE. The Auk. 2007. p. 1121. doi: 10.1642/0004-8038(2007)124[1121:eownvt]2.0.co;2 - DOI
    1. Kramer LD, Styer LM, Ebel GD. A Global Perspective on the Epidemiology of West Nile Virus. Annual Review of Entomology. 2008. pp. 61–81. doi: 10.1146/annurev.ento.53.103106.093258 - DOI - PubMed