0% found this document useful (0 votes)
25 views23 pages

Artigo NBA

This research paper explores the prediction of NBA game outcomes using advanced machine learning algorithms based on half-time statistics from the 2020/21 to 2022/23 seasons. The study demonstrates high accuracy in predictions, highlighting the importance of selecting relevant performance statistics and employing non-linear models. The findings suggest that accurate predictions can aid coaches in strategy development and benefit stakeholders in the sports betting industry.

Uploaded by

matheuzindm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views23 pages

Artigo NBA

This research paper explores the prediction of NBA game outcomes using advanced machine learning algorithms based on half-time statistics from the 2020/21 to 2022/23 seasons. The study demonstrates high accuracy in predictions, highlighting the importance of selecting relevant performance statistics and employing non-linear models. The findings suggest that accurate predictions can aid coaches in strategy development and benefit stakeholders in the sports betting industry.

Uploaded by

matheuzindm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Discover Artificial Intelligence

Research

On predicting an NBA game outcome from half‑time statistics


Michail Tsagris1 · Christos Adam1 · Pavlos Pantatosakis1

Received: 16 August 2024 / Accepted: 12 November 2024

© The Author(s) 2024  OPEN

Abstract
Predicting the outcome of an NBA game is a major concern for betting companies and individuals who are willing to
bet. We attack this task by employing various advanced machine learning algorithms and techniques, utilizing simple
half-time statistics from both teams. Data collected from 3 seasons, from 2020/21 up to 2022/23 were used to assess the
predictive performance of the algorithms at two axes. For each season separately, apply the algorithms and estimate the
outcomes of the games of the same season and secondly, apply the algorithms in one season and estimate the outcomes
of the games in the next season. The results showed high levels of accuracy as measured by the area under the curve. The
analysis was repeated after performing variable selection using a non-linear algorithm that selected the most important
half-time statistics, while retaining the predictive performance at high levels of accuracy.

Keywords NBA · Half-time statistics · Game outcome · Machine learning

1 Introduction

The National Basketball Association (NBA) is North America’s top professional basketball league, comprising of 30 teams
divided into the Western and the Eastern Conference, each containing 15 teams (Table 4 in Appendix). The NBA season is
structured into three main parts: the regular season, the play-in tournament and the playoffs. During the regular season,
each team participates in 82 games, which determine their rankings within each conference. The top six teams from
each conference automatically advance to the playoffs, while the teams seeded seventh through tenth compete in the
play-in tournament to secure the seventh and eighth playoff seeds in each conference. The playoffs consist of a series
of elimination rounds in which teams compete in best-of-seven game series. In the first round of the playoffs, the team
ranked first in each conference plays the team ranked eighth, the second-ranked team plays the seventh, the third plays
the sixth, and the fourth plays the fifth. The champions of each conference’s playoffs then face each other in the NBA
Finals to determine the league champion.
Each NBA game is divided into four quarters, each lasting 12 min. If the score is tied at the end of the fourth quarter,
the game proceeds into overtime, which consists of additional five-minute periods until a winner is determined. Every
ball possession last 24 s and the game clock stops for various reasons, such as fouls, timeouts and other interruptions,
which extend the actual game duration beyond the 48 min of playtime. NBA games include a 15-minute halftime break
between the second and third quarters (after 24 min), during which players can rest and reorganize their strategy.
Predicting the outcome of NBA games has implications for various stakeholders, including sports enthusiasts,
team management and the sports betting industry. For fans, predicting live games not only heightens their cognitive

* Michail Tsagris, [email protected]; Christos Adam, [email protected]; Pavlos Pantatosakis, [email protected] |


1
University of Crete, Rethymno, Greece.

Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

connection with the game but also provides them a sense of active participation. Simultaneously, accurate outcome
predictions before the game ends could help NBA coaches and their associates optimize team management decisions
and mitigate the impact of player fatigue. Finally and most crucially, it is evident why it is important for gamblers; it
maximizes their gains while diminishing the likelihood of losing bets.
These reasons, and the increasing popularity of basketball, gave researchers and sports analysts incentives and motiva-
tions to commence attempts to make such predictions in recent years. [25] proposed that the team in the lead would win
if the absolute value of the point difference exceeds the number of minutes left in the game. While this method appears
straightforward and simple, it achieved a notable 93.2% accuracy rate when this condition was met. The catch, however,
is that this criterion may occur too late in the game with reduced value for betting (low moneyline), or may not happen
at all in close games. On November 27, 1996, the Utah Jazz executed the greatest comeback of all time, overcoming a
34-point half-time deficit against the Denver Nuggets at the Delta Center. This remarkable achievement underscores the
importance of recognizing the home-court advantage in basketball games. [4] aimed to predict game outcomes from
intermediate scores for various sports and observed that basketball was the sport that provided the strongest evidence
of a home-court advantage. There were statistically significant differences between home and visiting team victories
and although late game leaders win about 80% of the time, home teams were more than 3 times likely to make fourth
quarter comebacks.
A pivotal component of NBA star performance revolves around their capacity to attract personal fouls. Giannis Ante-
tokounmpo, Joel Embiid and James Harden, are some among other franchise players, consistently leading in fouls drawn
per game, surpassing the league’s mean points per game through successful free throws alone. Past research has shown
that winning teams score a higher percentage of their points through free throws [32], highlighting the importance of
foul drawing in determining game outcomes. Consequently, it is imperative for the NBA to examine how these fouls
are drawn by reviewing officiating practices. This have been identified by individuals engaged in sports betting who
often attribute game outcomes to referee bias, positing that their decisions influence results through incorrect calls. [3]
found evidence suggesting preferential treatment by referees toward NBA All-Stars, while [6] did not confirm this claim
when analyzing late-game foul calls. Additionally, [30] identified referee bias in the NBA, revealing a higher incidence
of personal fouls called against players by referee crews of a different race, however subsequent findings [29] indicated
that this discrimination disappeared after media coverage.
Interestingly, the concept of the hot hand is often linked with NBA superstars and although there is no universal
definition of a superstar, it is commonly used to denote All-NBA team caliber players. For instance, [17] identified a sig-
nificant pattern in LeBron James’ transitions between NBA teams, where his arrivals were associated with an increase in
his team’s win percentage compared to the preceding season. While this increase ranged from 20% to 110% during his
stints in Cleveland, Miami and his return to Cleveland, it diminished to a mere 5.71% upon his relocation to Los Angeles.
Notably, prior to this last transition (Los Angeles), James was averaging 76 games per season, starkly contrasting by his
Lakers debut season, during which he played only 55 games. This could suggest that teams with NBA stars on the court
may exhibit a greater likelihood of winning compared to periods when these stars are out. Even though this could be a
factor affecting the game outcome it was outside the scope of this paper.
Another aspect is the prevailing belief within betting circles implies that teams on a winning streak will continue
winning, prompting even experienced gamblers to place bets based on momentum from previous results or when
watching live a player having a spectacular game. This tactic is common and past research investigating the presence of
such notions have yielded conflicting results. [24] provided evidence against momentum and the hot hand, by estimat-
ing a dynamic state space model for team strength, while data-driven Machine Learning (ML) algorithms [15] proposed
that winning record of past games is crucial in predicting basketball outcomes, achieving an accuracy of 65.2% using
Random Forests (RF).
In the same spirit, [12, 21] applied linear regression models to predict the game difference between home and away
teams, but they did not perform any prediction evaluation. [18] examined the 2015/16 regular season and by using ML
techniques estimated high levels of accuracy (at least 85%) in predicting the game outcome. [36] employed ML tech-
niques to predict the NBA finals games though.
Note however, that the precision ML algorithms provide on sport outcome prediction is dependent on many param-
eters: the available data, the classification algorithm and of course the sport; the NBA is a league characterized by numer-
ous trades each season and the composition of strong teams varies annually, therefore prior seasons may not be relevant

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

to predict matches in future seasons and in this case an average model classification accuracy could be produced for each
season. Contrarily, a more efficient method could be order-preserved Leave-One-Out Cross-Validation (LOOCV) to update
the training data set after every match has been played and all past matches up to the current match is training data [2].
In this paper, we also utilized ML algorithms with a dual objective; the first being accurate prediction of an NBA’s
game outcome, during the regular season. The use of (non-linear) ML algorithms aided in overcoming the disadvantages
of linear models, and achieved high levels of accuracy. We highlight though, that unlike other studies, in our case, the
game outcome prediction relies upon the on-court statistics of the two teams at the half-time (end of second quarter).
However, our contribution moves further than this step. We performed Variable Selection (VS) in order to select the most
important on-court statistics and we further analyzed the contribution of each of the selected statistics. The challenge
associated to this task was the non-linear contribution of the teams’ performance statistics to the probability of win. We
addressed this via a non-linear VS algorithm.
The significance of our study is the detection of a smaller number of performance statistics that can be fed in a non-
linear model that will produce accurate predictions regarding the outcome of an NBA game. Since the predictions are
highly accurate, this could constitute a very useful tool for coaches to work on their strengths and weaknesses, and
perhaps decide on their strategy, during the half-time of a game.
In Section 2, a description of the performance statistics is presented, while in Section 3 the methodology followed,
the accuracy of the algorithms and a discussion of the most important statistics identified are presented, followed by
the conclusions closing the paper.

2 Description of the data

We start by compiling all the required information about the games; we collect the game outcomes and the half-time
differentials between the home and guest teams. The main source of data was basket​ ball-r​ efere​ nce.c​ om which is broadly
known for providing a great variety of reliable sports statistics. The data were collected from the 2020/21, 2021/22 and
2022/23 NBA seasons, consisting of 1080, 1230 and 1230 observations, respectively. We chose the last three seasons on
the grounds that NBA is an evolving system and selecting earlier years could influence the model. Basketball has changed
significantly in the way it is played, with each era reflecting different styles of play, rule modifications and player skill
sets. Including data from earlier seasons in a predictive model can negatively impact its accuracy because the game has
evolved, making older data less relevant for predicting modern performance. For example, rule changes, tactical changes,
and changes in player skillset requirements.

Table 1  Welch’s t-test Results for Comparing Means Between Home Victories-Losses
FG FGA FG% 3P 3PA 3P% FT FTA FT%

2020–2021 Regular Season


p-value < 0.001 < 0.001 < 0.001 < 0.001 0.006 < 0.001 < 0.001 0.018 < 0.001
*ORB DRB TRB AST STL BLK TOV PF PTS
p-value 0.251 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.007 < 0.001
2021–2022 Regular Season
p-value < 0.001 0.008 < 0.001 < 0.001 0.972 < 0.001 < 0.001 0.007 < 0.001
*ORB DRB TRB AST STL BLK TOV PF PTS
p-value 0.975 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.025 < 0.001
2022–2023 Regular Season
p-value < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.004 0.055 0.009
*ORB DRB TRB AST STL BLK TOV PF PTS
p-value 0.176 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.013 < 0.001

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Secondly, three seasons were chosen in order to better assess the predictive of the final selected model, as will be
explained in the next section. Limiting the analysis to just the most recent three seasons ensures both the model’s
relevance and its predictive performance. By focusing on data from the current era, the model can concentrate on: a)
current playing style. Modern basketball, particularly in the last few years, is characterized by higher offensive efficiency
and the dominant role of the three-point shot. The offensive rating, which measures points scored per 100 possessions,
has increased from 105.6 in the 2014–2015 season to 122.2 today1. By limiting the analysis to recent seasons, the model
captures these crucial trends and can make more accurate predictions based on how the game is currently played. b)
predictive relevance. Player skill sets, tactics and strategies have become more consistent across recent seasons, making
this data more predictive of future outcomes. Including data from too far back introduces noise from outdated styles
of play, which can confuse the model and weaken its accuracy. By focusing on a shorter, more relevant time span, the
model remains more reliable for predicting performance in today’s game.
We collected the final outcome of each game and a multitude of 18 half-time statistics to provide a plurality of infor-
mation about the games. The half-time statistics include the home minus the away team statistics in the Field Goals (FG),
Field Goals Attempted (FGA) and Field Goal Percentage (FG%). Additionally, we collected beyond the arc statistics, such
as Three Points Made (3P), Attempted (3PA) and the difference in percentage (3P%). In a similar manner, we have Free
Throws (FT), Free Throw Attempts (FTA) and Free Throw Percentage (FT%), Offensive Rebounds (ORB), Defensive Rebounds
(DRB) and Total Rebounds (TRB). Furthermore, Assists (AST), Steals (STL), Blocks (BLK), Turnovers (TOV), Personal Fouls
(PF) and Points (PTS) were also included. Lastly we know the outcome of the game, whether the home team won or lost.

2.1 Descriptive statistics of the data

Table 5 (see Appendix) presents the descriptive statistics of the data. Following the Orlando Bubble (2019/2020),
the 2020/2021 season started on December 22, just 72 days after the completion of the 2020 NBA Finals (shortest
off-season in NBA history), where the Los Angeles Lakers won their 17th championship. This led to a reduction of
the 2020/2021 regular season to 72 games for each team, instead of the normal 82-game format. For this reason,
2020–2021 season has less observations ((72 games × 30 teams) / 2 = 1080 games) than 2021–2022 & 2022–2023
((82 games × 30 teams) / 2 = 1230 games).
In the first two seasons (2020/21 and 2021/22), the home team secured a victory in 54.69% and 54.22% of the
games, respectively, while in the 2022/23 season, the home win percentage slightly increased to 57.96%. Notably,
the mean Field Goals (FG) and Points (PTS) for all seasons are positive, indicating that, on average, the home team
leads at half-time. Additionally, Personal Fouls (PF) have a negative mean and Free Throws Attempted (FTA) a posi-
tive, suggesting that the home team is scoring more points from the free throw line by half-time. The results from
Weltch’s t-tests as shown in Table 1 confirm that these differences are statistically significant, further highlighting
that the home team gets more calls. Specifically, Free Throws Attempted (FTA) is statistically significant for both the
2020/21 and 2021/22 seasons and almost reaching significance for the 2022/23 season. Table 1 also reveals that for
all three seasons, Offensive Rebounds (ORB) had no statistically significant differences on the game outcome. For
season 2022/23, except for Offensive Rebounds (ORB), Three-Pointers Attempted (3PA) were not statistically signifi-
cant too. This univariate analysis shows that almost all half-time statistics, offensive and defensive, play a crucial role
in the game outcome.
We have no remarkable comebacks in our data in these three seasons, as the maximum and minimum point dif-
ferences in half-time, ultimately resulted in the team in the lead winning the game. One week after the 2020/2021
regular season began, the NBA world witnessed a historic game between the Dallas Mavericks and the Los Angeles
Clippers. The Mavericks, led by then 22 year old Luka Doncic, seized a remarkable half-time lead of 50 points (77-
27) against Paul George and the Clippers. This half-time margin surpassed a record that had stood for nearly three
decades, previously held by the Cleveland Cavaliers with a 44-point half-time lead in 1991 versus the Miami Heat.
We emphasize, that while the vast majority of the statistics across the three seasons, provided evidence of a sta-
tistically significant difference between home teams winning and losing the games, this was a univariate analysis.
It is not sufficient enough to tell the whole story and it does not imply that all statistics are important for predicting
the outcome of the game. For this reason we need to perform a more advanced methodology, take into account all
statistics and decide which are the most important in predicting the outcome of the game.

1
Information retrieved from Baske​tball​ Refer​ence.

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

Fig. 1  Box-plots of the estimated AUC for each model and for each season before VS

3 Predictive performance estimation and identification of the key performance factors

Ten statistical and ML algorithms were utilised2 in order to predict whether the home team would win the game. Those
algorithms were Recursive Partitioning and Regression Trees (RPRT), k-Nearest Neighbours (k-NN), Support Vector
Machine (SVM) using the Radial Basis Function (RBF), Linear and Polynomial kernel functions, Neural Networks (NNET),
Naive Bayes (NB), Gradient Boosting Machine (GBM), Extreme GBM (ExGBM), and Logistic Regression (LR). On top of
those, ensemble learning was performed, using a greedy approach, LR, GBM, ExGBM and RF, to combine the predictions
of the aforementioned algorithms.

2
The statistical software R [31] was employed using the R package caret [19].

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Fig. 2  Box-plots of the estimated AUC for each model and for each season after applying the Boruta VS algorithm

3.1 Methodology

As highlighted by [28], fitting a model to the available and then using the same data to estimate its predictive capabili-
ties produces wrong conclusions. Similarly to that study, to assess the predictive performance of the algorithms we also
employed the 10-fold Cross-Validation (CV) protocol [13], repeated 20 times to account for possible sources of variations
among the splits. The Area Under the Curve (AUC) was utilised to measure the predictive performance of the algorithms
during the CV protocol.
VS was also performed as an extra step of the analysis using the Boruta non-linear VS algorithm [20]. The Boruta algo-
rithm utilizes, iteratively, the RF algorithm to fulfill its purpose and this allows for computation of the variable importance
at each step. The VS process and the predictive performance (AUC) of each algorithm were cross-validated, again using

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

Table 2  Estimated AUC values Before VS After VS


before and after the Boruta VS
2020/21 2021/22 2022/23 2020/21 2021/22 2022/23

SVM(Polynomial) 0.82 0.80 0.79 0.82 0.80 0.79


SVM(Linear) 0.82 0.80 0.78 0.82 0.80 0.79
SVM(RBF) 0.81 0.79 0.77 0.81 0.79 0.77
RF 0.81 0.79 0.77 0.81 0.79 0.77
GBM 0.82 0.80 0.78 0.82 0.80 0.78
LR 0.82 0.80 0.78 0.82 0.80 0.79
NNET 0.81 0.79 0.77 0.82 0.80 0.78
NB 0.81 0.79 0.78 0.81 0.79 0.78
RPRT 0.78 0.73 0.74 0.78 0.73 0.73
ExGBM 0.82 0.80 0.78 0.82 0.80 0.78
k − NN 0.79 0.75 0.74 0.79 0.75 0.73
Ensemble(RF) 0.95 0.94 0.93 0.96 0.94 0.94
Ensemble(ExGBM) 0.90 0.87 0.86 0.90 0.88 0.87
Ensemble(Greedy) 0.80 0.77 0.77 0.80 0.79 0.77
Ensemble(GBM) 0.86 0.83 0.82 0.86 0.84 0.82
Ensemble(LR) 0.83 0.80 0.79 0.83 0.81 0.79

the 10-fold CV protocol repeated 20 times. The benefit of the VS is that it returns the most important statistics related
to the task of prediction, thus reduces the complexity of the algorithms, and facilitates an easier interpretation of the
identified important statistics. Since the VS process coupled with the CV protocol was repeated 20 times, this allowed
us to estimate the stability of the selected statistics. That is, the number of times each of the statistics was selected by
the VS in each fold.
Hyperparameter tuning optimization was performed during CV. The polynomial degree (1,2,3), scale (0.001, 0.010,
0.100) and regularization parameter (0.25, 0.50, 1.00), are tuned for SVM(Polynomial). The cost for SVM(Linear) was set
equal to 1, while for SVM(RBF) the values for cost were (0.25, 0.50, 1.00) and 𝜎 = (0.0710, 0.0522, 0.0584). For the RF,
the number of randomly selected predictors for each season were (2,7,13) for the 2020/2021 season, (2,8,15) for the
2021/2022 season and (2,7,13) for the 2022/2023 season, the splitting rule was based on the extra trees or the Gini coef-
ficient and the minimal node size was set equal to 1. For GBM, Ensemble(GBM) and Ensemble(Greedy), the number of
boosting iterations was set equal to (50,100,150), the max tree depth was equal to (1,2,3), the shrinkage level equal to 0.1,
and the minimal terminal node size was equal to 10. For NNET, the number of hidden units were (1,3,5), and the weight
decay size was (0, 0.0001, 0.1000). For NB, no Laplace correction was applied, the distribution type was set to (TRUE,FALSE)
and the bandwidth adjustment was equal to 1. The complexity parameter in RPRT was set to (0.0067,0.0105,0.4639),
(0.0070,0.0079,0.4098), and (0.0058,0.0071,0.3482) for the 2020/2021, 2021/2022 and 2022/2023 seasons, respectively.
In both ExGBM and Ensemble(ExGBM), the number of boosting iterations were (50,100,150), the maximal tree depth was
(1,2,3), the shrinkage was (0.3,0.4), the minimum loss reduction was 0, the subsample percentage was (0.5,0.75,1), the
subsample ratio of columns was (0.6,0.8), the fraction of trees dropped was (0.01,0.5), the probability of skipping drop-
out was (0.05,0.95) and the minimum sum of instance weight was set to 1. For k − NN , the number of neighbors were
(5,7,9). For the Ensemble(RF), the number of randomly selected predictors were c(2,6,11), the splitting rule was either
the extra trees or Gini coefficient, the minimal node size was set to 1. For LR and Ensemble(LR), no hyper-parameter was
used. The optimal values of the aforementioned hyper-parameters were selected based on the repeated CV protocol.
For each algorithm the selected hyper-parameter values were the ones that yielded the highest cross-validated AUC.
Statistical significance of the differences among the compared algorithms was carried out, as proposed in [5]. In
particular, the non-parametric statistical Friedman test [8, 9, 14], in the experimental setup. The algorithms for each
data set are ranked separately by Friedman test and the average ranks are used in ties [5]. This test is considered as a
non-parametric alternative of the ANOVA test for repeated-measures [5]. The equivalence of the methods’ performances
is tested on the null hypothesis of this test. The null hypothesis if this test is rejected, then the post hoc Nemenyi test
could be employed for performing comparison tests for each pair of performances of the selected algorithms [5, 27].
According to the Nemenyi test, the performance of two algorithms is statistically significant if their average rank is not
less than the Critical Difference (CD) between them [5].

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Fig. 3  Box-plots of the estimated importance of each feature and for each season after applying the Boruta VS algorithm

Over the last years, multiple methods have been proposed for interpreting the predictions made by ML models. The
most prominent interpretation method is to utilize the Shapley additive explanation (SHAP) values [22, 34]. SHAP values
constitute a game-theoretical way of detecting the effect of predictors on target, by calculating the total pay-off of a
cooperative game. More precisely, equitably distributing the outcomes of a cooperative game for assignment of credit
for the target among its predictors are involved in this process. Therefore, participation of each feature at the game is
determined. So, a subset of predictors at each time are examined by SHAP values. The final output of this SHAP values
algorithm is a matrix with the same dimensions as the input data for explaining. The fast approximate method of [35]
for interpreting and recognizing the contribution of each predictor on the output is incorporated on the current study,
following [10]’s implementation.
Accumulated Local Effects (ALE) are an alternative way for explaining black-box ML models. This method estimates
the mean effect of each feature on predictions by estimating prediction differences rather than means, where features’
local effects are averaged and accumulated over their conditional distributions [1]. It is proposed as an alternative to

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

Fig. 4  Barplot of mean absolute SHAP values of each feature for each year after Boruta VS for the SVM(Polynomial) algorithm

the Partial Dependence (PD) [7] method. In comparison to PD, ALE do not require extrapolation of the target outside
of sample size, can produce unbiased results even under feature correlation and is more computationally efficient [1].
However, feature effects from this method might diverge from the traditional interpretation methods of the examined
model, like the coefficients of the linear regression [11].

3.2 Results

Both approaches were applied to each of the three regular seasons and the results are presented in Figs. 1 and 2. Figure 1
shows that all algorithms produced high values of AUC, more than 0.75, but the ensemble learning with RF outperformed
all of them resulting in really high AUC values (more than 90%) for each regular season. The picture was the same after

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Table 3  AUC of predictions for next seasons using the models fitted on the previous seasons data
Prior to VS After VS
Model 2021 → 2022 2021 → 2023 2022 → 2023 2021 → 2022 2021 → 2023 2022 → 2023

SVM(Polynomial) 0.802 0.786 0.780 0.803 0.787 0.781


SVM(Linear) 0.799 0.783 0.777 0.800 0.785 0.776
SVM(RBF) 0.787 0.773 0.767 0.784 0.765 0.768
RF 0.796 0.779 0.775 0.795 0.771 0.770
GBM 0.802 0.778 0.779 0.802 0.776 0.779
LR 0.799 0.786 0.778 0.799 0.786 0.778
NNET 0.801 0.786 0.780 0.757 0.734 0.780
NB 0.795 0.780 0.778 0.793 0.776 0.775
RPRT 0.758 0.724 0.726 0.758 0.724 0.726
ExGBM 0.799 0.773 0.776 0.803 0.775 0.775
k − NN 0.772 0.755 0.749 0.766 0.760 0.745
Ensemble(RF) 0.775 0.757 0.749 0.772 0.761 0.744
Ensemble(ExGBM) 0.775 0.761 0.734 0.771 0.758 0.738
Ensemble(Greedy) 0.802 0.780 0.776 0.802 0.785 0.774
Ensemble(LR) 0.802 0.780 0.776 0.802 0.785 0.774

applying VS as observed in Fig. 2. However, for both cases the results are "too good to be true" and these suspiciously
high predictive performances could be attributed to over-fitting.
Table 2 presents the estimated predictive performance of each algorithm. Excluding the ensemble learning results,
all other models have performed nearly the same, with the exception of the SVM algorithm that has produced slightly
better through out the three seasons under study and in general its performance lies around 80%.
Figure 5 shows the stability of the game statistics, that is the number of times each statistic was performed by the
Boruta VS. Perfonal Fouls (PF) and Blocks (BLK) were the the two statistics not frequently selected (less than 50% of the
times) by the Boruta VS in all three seasons. For the 2020/2021 season, Offensive rebounds (ORB) and 3 point attempts
(3PA) were also not frequently selected, for the 2021/22 season offensive rebounds (ORB) were also rarely selected, and
finally for the 2022/2022 season, the Free Throws percentage (FT%) was also seldom selected.
The statistics that were inserted into the algorithms after the VS were the ones with positive Importance, as depicted
in Fig. 3. Notably, the three most important statistics, common in all three seasons, where the points (PTS), the field goals
percentage (FG%) and the field goals made (FG). As an extra attempt to shed more light onto the important statistics,
the Shapley (SHAP) values [34], an important concept in ML [26], were computed.
The feature importance results and global feature association with probability of winning from SHAP values analyses
for the ensemble model (based on the RFs) and the three examined years are illustrated in Figs. 4 and 8, respectively.
According to these, the most important feature for predicting the win of a game for all seasons is the point difference
(PTS) and a clear positive relationship among values of points and win probability is observed. This feature seems to have
the largest contribution relatively to all the other features. The field goal and free throws (FG and FT, respectively) are
the next most important features for 2020/21. In Fig. 8 FG and FT have positive association with the probability of win.
Taking into consideration the 2020/21 data, the second and third most important features for prediction (in descend-
ing importance order) are the FT and FG% (Fig. 4) and a positive association with the probability of winning is revealed
(Fig. 8). Regarding 2022/23, the second and third features with the highest contribution on winning prediction are FG
and 3 point attempts (3PA) and a positive relationship is shown for all of them. Therefore, in all three years the PTS, FG%
and FT% are always placed at the top-three most important features and have a positive association with a positive
prediction of winning for the home team.
Friedman’s test [8, 9] applied to all algorithms showed statistically significant differences among their performances,
for all three seasons (see Fig. 6). The Remenyi test [5] (Figs. 6 - 7) revealed interesting groupings. For all three seasons,
the ensemble of the algorithms using RF and ExGBM showed no statistically significant differences. The ensemble using
GBM comes in the third place and differs statistically significantly from all other methods, and then there is a grouping
of most of the other algorithms.

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

The final task was to see the future predictive performance of the algorithms. To this end, the algorithms were trained
in the half-time statistics of all games in a season and used to predict the outcomes of the next season games. The results
appear in Table 3 and as expected the AUC values are much lower. When the algorithms are trained on one season and
used to predict the game outcomes of the next season, the optimal AUC values float around 80%, but if they are used to
predict the outcomes of 2 seasons ahead, the predictions worsen and the AUC values reduce to 78%. This phenomenon
provides evidence that over-fitting has occurred. The predictive performance of the SVM algorithm though remains in
the same levels (around 80%) as those observed in Table 2.
Figures 9, 10 and 11 visualize the effect of each feature on the estimated probability of the home team to win. For
the 2020/2021 season turnovers (TOV) have an unexpected behaviour as they seem to positively affect the prob-
ability of win, however the range is rather narrow. Total rebounds (TRB) and the 3-pointer percentage (3P%) seem
to be negatively correlated with the probability of win. The most influential features seem to be the points (PTS)
and its related features, field goals attempted (FGA) and made (FG), the free throws made (FT) and the 3-pointers
made (3P). An almost similar image is seen for the 2021/2022 and 2022/2023 season, except for the turnovers (TOV)
that have a negative effect, as expected, but the free throws attempted (FTA) have a negative effect. Overall, the
direction of the effect of the features, with some exceptions, is as expected.
When examining the boxplots of the features (see Fig. 12 for instance) one can see the expected differences
between home team wins and home team loses. The fact that the ALE plots of Figs. 9, 10 and 11 produce opposite
effects for some features could be an indication of multicollinearity among the features, a phenomenon which is
not treated by ML algorithms.

4 Conclusions

This paper estimated the probability of a home team winning an NBA game based on the half-time statistics and
specifically the differences between that team and the away team by employing ML algorithms. The results showed
that a very highly accurate prediction is possible and further, not all the available statistics are necessary to fulfill
this purpose. In general, as expected, the differences in points, field goal and free throws percentages were shown
to be the three most important statistics.
We found that the points a team scores and the point differential at halftime are important components in pre-
dicting the result of a game. In addition to determining which team is likely to win, points are also crucial for players.
Winning games is not the only reason that points per game are significant; athletes are professionals whose goal is
to maximize their earnings. [23] found that points per game are a major contributor to a player’s salary. Furthermore
three pointers are also important and [33] also notes that 3-Pointers and the 3P% is the most influential factor for
the number of wins a team achieves during a season. Additionally, we show that free throws and their differential
at halftime are crucial determinants. [16] finds that the difference in the total number of free throws attempted is
significant in predicting the outcomes of NBA basketball games.
Half-time point differentials give us a snapshot of which team is controlling the game and a huge point differential
at half-time not only reflects the physical effort needed for a comeback but also affects the psychology of the play-
ers. Psychological factors can greatly influence a player’s confidence, which can impact their performance on the
court. Free throws, for example, are a critical component of the game where player confidence plays a crucial role.
During a time out back in 2010, coach Greg Popovich humorously motivated his players by saying, ”Next guy that
misses a free throw is gonna buy me a new car”. This underscores the necessity of maintaining focus and confidence
in high-pressure situations.
The main difference of this study with previous studies is mainly on the use of the ML algorithms and the compu-
tation of the SHAP/ALE values. For instance, [25] gave a simple criterion, that relies solely upon the difference in the
points. [32] highlighted the importance of free throws stemming from fouls. In contrast to those studies, our study
included many more variables, and examined their contribution in a non-linear way. [4] provided evidence of exist-
ence of the the home-court advantage. Our approach already incorporates this as we have utilised the differences
in the on-court performance statistics between the home and the guest team the response variable is the win or
loss of the home team.
On the limitations of this study, we could perhaps add personal fouls, mentioned earlier, but fouls that could spe-
cifically be attributed to referee bias. This is a factor that we did not account for, simply because this factor is rather
difficult to measure online, i.e. when the game is still being played. Another limitation of our study is that it focuses

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

only on regular season data and does not consider the time as an axis, which can be irrelevant and sometimes dif-
ficult to predict due to various factors. For instance, after the All-Star game break in the NBA, several key factors
change and tend to influence team performance. Players rest and recover from the first half of the season, which
can lead to improved performances in the second half of the season. Additionally, the trade deadline often aligns
with the All-Star break, changing team dynamics and chemistry. Lastly, teams competing for playoff spots increase
their intensity, aiming for stronger finishes and spots that secure a home-court advantage. By understanding these
changes, individuals can stay up to date on player health, trades and team standings to make more informed deci-
sions resulting in placing better bets. However, these factors are not easy to quantify or account for in predictive
models. As a result, while our model provides valuable insights based on halftime statistics, it may not fully account
for the external complexities that influence team behavior throughout the season, particularly before and after the
All-Star game break.
Lastly, regarding future work, we plan to consider neural networks, currently a very popular type of models that
has proved successful in various disciplines. However, those models typically involve numerous hyper-parameters
that must be tuned and multiple strategies that need to be tried in order to yield the optimal results.
Closing this paper we shall mention that the cold hand theory, which is the opposite of the hot hand fallacy, is
not valid. The only way for a team with a 20% field goal percentage to improve is by taking more shots. Teams that
are trailing at the half-time should focus on playing with more physicality, grabbing more rebounds and then shoot
more, ideally with better accuracy.

Author contributions MT and PP conceived the idea. PP collected the data and CA performed the analysis. All three participated in the writ-
ing of the paper.

Funding The authors did not receive support from any organization for the submitted work.

Data availability All the data used in this study are available upon request.

Declarations
Ethics approval and consent to participate The authors have not used any data that require ethics approval.

Competing interests The authors declare no Conflict of interest, have no relevant financial or non-financial interests to disclose and have no
Conflict of interest to declare that are relevant to the content of this article.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which
permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to
the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You
do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party
material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds
the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creat​iveco​
mmons.​org/​licen​ses/​by-​nc-​nd/4.​0/.

Appendix

See Tables 4, 5 and Figs. 5, 6, 7, 8, 9, 10, 11 and 12.

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

Table 4  NBA Teams Western Conference Eastern Conference

Dallas Mavericks Boston Celtics


Denver Nuggets Atlanta Hawks
Golden State Warriors Brooklyn Nets
Houston Rockets Charlotte Hornets
Los Angeles Clippers Chicago Bulls
Los Angeles Lakers Cleveland Cavaliers
Memphis Grizzlies Detroit Pistons
Minnesota Timberwolves Indiana Pacers
New Orleans Pelicans Miami Heat
Oklahoma City Thunder Milwaukee Bucks
Phoenix Suns New York Knicks
Portland Trail Blazers Orlando Magic
Sacramento Kings Philadelphia 76ers
San Antonio Spurs Toronto Raptors
Utah Jazz Washington Wizards

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Table 5  Descriptive statistics 2020/21 Regular Season. Home Victories = 54.69%


FG FGA FG% 3P 3PA 3P% FT FTA FT%

Mean 0.093 – 0.380 0.006 0.236 0.126 0.009 0.376 0.443 0.003
Std. Dev. 4.981 5.456 0.106 3.817 5.750 0.174 4.770 5.498 0.228
Min – 20.000 – 18.000 – 0.366 – 12.000 – 21.000 – 0.538 – 17.000 – 17.000 – 0.875
1st Qrtl. – 3.000 – 4.000 – 0.069 – 2.000 – 4.000 – 0.114 – 3.000 – 3.000 – 0.139
Median 0.000 0.000 0.007 0.000 0.000 0.015 0.000 0.000 0.000
3rd Qrtl. 3.000 3.000 0.077 3.000 4.000 0.134 3.000 4.000 0.153
Max 16.000 16.000 0.287 16.000 19.000 0.501 18.000 18.000 1.000
ORB DRB TRB AST STL BLK TOV PF PTS
Mean – 0.050 0.314 0.263 0.180 – 0.070 0.067 0.135 – 0.074 0.800
Std. Dev. 3.375 5.008 6.232 4.609 2.727 2.195 3.427 3.098 11.709
Min – 9.000 – 17.000 – 17.000 – 15.000 – 9.000 – 7.000 – 11.000 – 11.000 – 50.000
1st Qrtl. – 2.000 – 3.000 – 4.000 – 3.000 – 2.000 – 1.000 – 2.000 – 2.000 – 7.000
Median 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
3rd Qrtl. 2.000 4.000 5.000 3.000 2.000 2.000 2.000 2.000 9.000
Max 11.000 16.000 20.000 15.000 10.000 8.000 12.000 9.000 38.000
2021/22 Regular Season. Home Victories = 54.22%
FG FGA FG% 3P 3PA 3P% FT FTA FT%
Mean 0.339 -0.052 0.008 0.082 0.188 0.002 0.169 0.191 0.004
Std. Dev. 5.036 5.972 0.105 3.507 5.248 0.165 5.002 5.961 0.210
Min – 16.000 – 19.000 – 0.373 – 11.000 – 15.000 – 0.530 – 18.000 – 21.000 – 1.000
1st Qrtl. – 3.000 – 4.000 – 0.058 – 2.000 – 3.000 – 0.111 – 3.000 – 4.000 – 0.138
Median 1.000 0.000 0.012 0.000 0.000 0.000 0.000 0.000 0.000
3 Qrtl.
rd 4.000 4.000 0.077 3.000 4.000 0.117 3.000 4.000 0.139
Max 19.000 22.000 0.373 13.000 16.000 0.579 19.000 19.000 0.846
ORB DRB TRB AST STL BLK TOV PF PTS
Mean – 0.002 0.293 0.292 0.365 – 0.012 0.190 – 0.156 – 0.116 0.933
Std. Dev. 3.595 4.969 6.261 4.869 2.821 2.239 3.460 3.256 11.615
Min – 12.000 – 14.000 – 19.000 – 17.000 – 9.000 – 9.000 – 12.000 – 12.000 – 31.000
1st Qrtl. – 2.000 – 3.000 – 4.000 – 3.000 – 2.000 – 1.000 – 2.000 – 2.000 – 8.000
Median 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2.000
3rd Qrtl. 2.000 4.000 5.000 4.000 2.000 2.000 2.000 2.000 9.000
Max 11.000 15.000 21.000 16.000 10.000 8.000 11.000 12.000 41.000
2022/23 Regular Season. Home Victories = 57.96%
FG FGA FG% 3P 3PA 3P% FT FTA FT%
Mean 0.433 – 0.020 0.010 0.098 – 0.069 0.007 0.423 0.483 0.009
Std. Dev. 4.947 6.016 0.104 3.888 6.094 0.169 4.810 5.616 0.210
Min – 14.000 – 20.000 – 0.279 – 11.000 – 23.000 – 0.701 – 13.000 – 20.000 – 0.846
1st Qrtl. – 3.000 – 4.000 – 0.057 – 2.000 – 4.000 – 0.107 – 3.000 – 3.000 – 0.126
Median 0.000 0.000 0.010 0.000 0.000 0.004 1.000 1.000 0.000
3rd Qrtl. 4.000 4.000 0.080 3.000 4.000 0.114 4.000 4.000 0.139
Max 15.000 17.000 0.310 14.000 22.000 0.500 15.000 18.000 0.857
ORB DRB TRB AST STL BLK TOV PF PTS
Mean 0.012 0.504 0.516 0.295 0.024 0.059 – 0.090 – 0.256 1.388
Std. Dev. 3.613 4.874 6.158 4.849 2.732 2.270 3.557 3.259 11.348
Min – 12.000 – 14.000 – 19.000 – 17.000 – 10.000 – 8.000 – 12.000 – 11.000 – 35.000
1st Qrtl. – 2.000 – 3.000 – 4.000 – 3.000 – 2.000 – 1.000 – 2.000 – 2.000 – 6.000
Median 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 2.000
3rd Qrtl. 2.000 4.000 4.000 4.000 2.000 2.000 2.000 2.000 9.000
Max 11.000 19.000 18.000 15.000 8.000 10.000 11.000 11.000 40.000

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

Fig. 5  Bar-plots of the stability of each feature during the Boruta VS, for each season

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Fig. 6  Friedman test and


post hoc Nemenyi test of the
estimated AUC for each model
pair and for each season
after applying the Boruta VS
algorithm

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

Fig. 7  Post hoc Nemenyi test of the estimated AUC for each model pair and for each season after applying the Boruta VS algorithm

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Fig. 8  Bee plot of Shapley values of each feature for each year after Boruta VS for SVM(Polynomial)

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

Fig. 9  ALE of the estimated


probability of home team
to win for the 2020/2021
season after Boruta VS for
SVM(Polynomial)

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Fig. 10  ALE of the estimated


probability of home team
to win for the 2021/2022
season after Boruta VS for
SVM(Polynomial)

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

Fig. 11  ALE of the estimated


probability of home team
to win for the 2022/2023
season after Boruta VS for
SVM(Polynomial)

Vol.:(0123456789)
Research Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9

Fig. 12  Boxplot of the features during the 2020/2021 season

Vol:.(1234567890)
Discover Artificial Intelligence (2024) 4:111 | https://doi.org/10.1007/s44163-024-00201-9 Research

References
1. Apley DW, Zhu J. Visualizing the effects of predictor variables in black box supervised learning models. J R Stat Soc Ser B Stat Methodol.
2020;82(4):1059–86.
2. Bunker RP, Thabtah F. A machine learning framework for sport result prediction. Appl Comput Info. 2019;15(1):27–33.
3. Caudill SB, Mixon FG Jr, Wallace S. Life on the red carpet: star players and referee bias in the National Basketball Association. Int J Econ
Bus. 2014;21(2):245–53.
4. Cooper H, DeNeve KM, Mosteller F. Predicting professional sports game outcomes from intermediate game scores. Chance.
1992;5(3–4):18–22.
5. Demšar J. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1–30.
6. Deutscher C. No referee bias in the NBA: new evidence with leagues’ assessment data. J Sports Anal. 2015;1(2):91–6.
7. Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat 2001; 1189–1232.
8. Friedman M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc.
1937;32(200):675–701.
9. Friedman M. A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat. 1940;11(1):86–92.
10. Greenwell B. fastshap: fast approximate Shapley values. R package version 0.1.1. 2024
11. Grömping U. Model-Agnostic Effects Plots for Interpreting Machine Learning Models. Reports in Mathematics, Physics and Chemistry,
Department II, Beuth University of Applied Sciences Berlin Report. 2020;1:2020.
12. Han R, Shi S, Hu T, Tao, S. Prediction of Future NBA Games’ Point Difference: A statistical Modeling Approach. In: Journal of Physics: Confer-
ence Series, IOP Publishing, 012003, 2022
13. Hastie T, Tibshirani R, Friedman JH, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. Springer;
2009.
14. Hollander M, Wolfe DA, Chicken E. Nonparametric statistical methods. John Wiley & Sons; 2013.
15. Horvat T, Job J, Logozar R, Livada Č. A data-driven machine learning algorithm for predicting the outcomes of NBA games. Symmetry.
2023;15(4):798.
16. Jones ES. Predicting outcomes of NBA basketball games. Master’s thesis, North Dakota State University. 2016
17. Josselyn C. The LeBron effect: is a superstar worth the money? 2019
18. Kaur H, Jain S. Machine Learning Approaches to Predict Basketball Game Outcome. In 2017 3rd International Conference on Advances
in Computing, Communication & Automation (ICACCA)(Fall), IEEE. 1–7, 2017
19. Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, Team RC, et al. Package ’caret’. R J 2020,223(7).
20. Kursa MB, Jankowski A, Rudnicki WR. Boruta–a system for feature selection. Fund Inform. 2010;101(4):271–85.
21. Lu J, Chen Y, Zhu Y. Prediction of Future NBA Games’ Point Difference: A Statistical Modeling Approach. In: 2019 International Conference
on Machine Learning, Big Data and Business Intelligence (MLBDBI). IEEE. 2019, 252–256.
22. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R,
Vishwanathan S, Garnett R, editors. Advances in neural information processing systems 30. Curran Associates Inc; 2017. p. 4765–74.
23. Lyons RJ, Jackson ENJ, Livingston A. Determinants of NBA player salaries. Sport J 2015;18.
24. Manner H. Modeling and forecasting the outcomes of NBA basketball games. J Quant Anal Sports. 2016;12(1):31–41.
25. McGivney K, McGivney R, Zegarelli R. Light it up: predicting the winner of an NBA game before the end. Chance. 2008;21(4):45–50.
26. Merrick L, Taly A. The Explanation Game: Explaining Machine Learning Models Using Shapley Values. In Machine Learning and Knowledge
Extraction: 4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020, Dublin, Ireland, August 25–28,
2020, Proceedings 4, Springer, 2020;17–38.
27. Nemenyi PB. Distribution-free multiple comparisons. Princeton University. 1963
28. Papadaki I, Tsagris M. Are NBA players’ salaries in accordance with their performance on court? In: Advances in econometrics, operational
research, data science and actuarial studies: techniques and theories, Springer, 2022;405–428.
29. Pope DG, Price J, Wolfers J. Awareness reduces racial bias. Manage Sci. 2018;64(11):4988–95.
30. Price J, Wolfers J. Racial discrimination among NBA referees. Q J Econ. 2010;125(4):1859–87.
31. R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2023.
32. Sampaio J, Janeira M. Importance of free-throw performance in game outcome during the final series of basketball play-offs. Int J Appl
Sports Sci 2003;15(2).
33. Shang X. The effect of points per game on the number of wins in NBA. Int J Intell Info Manag Sci 2019,8(1).
34. Shapley LS. Notes on the N-Person Game-II: the value of an N-person game. RAND Corporation. 1951
35. Štrumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst.
2014;41:647–65.
36. Thabtah F, Zhang L, Abdelhamid N. NBA game result prediction using feature analysis and machine learning. Ann Data Sci.
2019;6(1):103–16.

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vol.:(0123456789)

You might also like