Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 23;19(7):e0307478.
doi: 10.1371/journal.pone.0307478. eCollection 2024.

Integration of machine learning XGBoost and SHAP models for NBA game outcome prediction and quantitative analysis methodology

Affiliations

Integration of machine learning XGBoost and SHAP models for NBA game outcome prediction and quantitative analysis methodology

Yan Ouyang et al. PLoS One. .

Abstract

This study investigated the application of artificial intelligence in real-time prediction of professional basketball games, identifying the variations within performance indicators that are critical in determining the outcomes of the games. Utilizing games data from the NBA seasons 2021 to 2023 as the sample, the study constructed a real-time predictive model for NBA game outcomes, integrating the machine learning XGBoost and SHAP algorithms. The model simulated the prediction of game outcomes at different time of games and effectively quantified the analysis of key factors that influenced game outcomes. The study's results demonstrated that the XGBoost algorithm was highly effective in predicting NBA game outcomes. Key performance indicators such as field goal percentage, defensive rebounds, and turnovers were consistently related to the outcomes at all times during the game. In the first half of the game, assists were a key indicator affecting the outcome of the game. In the second half of the games, offensive rebounds and three-point shooting percentage were key indicators affecting the outcome of the games. The performance of the real-time prediction model for NBA game outcomes, which integrates machine learning XGBoost and SHAP algorithms, is found to be excellent and highly interpretable. By quantifying the factors that determine victory, it is able to provide significant decision support for coaches in arranging tactical strategies on the court. Moreover, the study provides reliable data references for sports bettors, athletes, club managers, and sponsors.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. NBA game outcome prediction model construction and application flowchart.
Fig 2
Fig 2. Framework diagram of the SHAP algorithm.
Fig 3
Fig 3. Heatmap of NBA game technical statistics data.
The color of the heatmap indicates the correlation between two features: darker colors represent stronger positive correlations, while lighter colors represent stronger negative correlations. The numerical values represent the correlation coefficients between corresponding features. The asterisks reflect the significance levels of the correlation coefficients: no asterisk denotes p > 0.05, one asterisk denotes 0.01 < p < 0.05, two asterisks denote 0.001 < p < 0.01, and three asterisks denote p < 0.001.
Fig 4
Fig 4. Exploratory scatter plots of technical statistic relationships from full game.
Fig 5
Fig 5. Comparative chart of performance evaluation metrics for NBA game outcome prediction models.
(a) first two quarters period, (b) first three quarters period, (c) full game period. SVM = Support Vector Machines; KNN = K-Nearest Neighbors.
Fig 6
Fig 6. Summary chart of SHAP feature importance at different time of the game.
(a) first two quarters period, (b) first three quarters period, (c) full game period.
Fig 7
Fig 7. Interpretation of SHAP force plot: analysis of performance in the first two quarters of G2 game.
The base value represents the average predicted probability of win or loss for the sample set of games, with red areas indicating that the feature has a positive effect on the prediction outcome, and blue areas indicating a negative effect. The length of the color bars reflects the magnitude of the impact, with longer bars signifying a greater influence of that particular feature on the prediction result. Features and their sample values are indicated below the color bars. Footnotes for Figs 8 and 9 are identical to that of Fig 7.
Fig 8
Fig 8. Interpretation of SHAP force plot: analysis of performance in the first three quarters of G2 game.
Fig 9
Fig 9. Interpretation of SHAP force plot: analysis of performance in the full G2 game.

References

    1. Sarlis V, Tjortjis C. Sports analytics—Evaluation of basketball players and team performance. Inform Syst. 2020;93:101562. 10.1016/j.is.2020.101562 - DOI
    1. Guo J, Yang L, Bie R, Yu J, Gao Y, Shen Y, et al.. An XGBoost-based physical fitness evaluation model using advanced feature selection and Bayesian hyper-parameter optimization for wearable running monitoring. Comput Netw. 2019;151:166–180. 10.1016/j.comnet.2019.01.026 - DOI
    1. Liu H, Hou W, Emolyn I, Liu Y. Building a prediction model of college students’ sports behavior based on machine learning method: combining the characteristics of sports learning interest and sports autonomy. Sci Rep-UK. 2023;13(1):15628. doi: 10.1038/s41598-023-41496-5 - DOI - PMC - PubMed
    1. Albert AA, de Mingo López LF, Allbright K, Gomez Blas N. A hybrid machine learning model for predicting USA NBA all-stars. Electronics-Switz. 2022;11:97. 10.3390/electronics11010097 - DOI
    1. Gao J, Ma C, Su H, Wang S, Xu X, Yao J. Gait recognition and prediction research based on improved machine learning algorithms. Journal of Biomedical Engineering. 2022;39(1):103–111. Chinese. 10.7507/1001-5515.202106072 - DOI - PMC - PubMed

LinkOut - more resources