0% found this document useful (0 votes)
44 views19 pages

IPL Prediction Using Data Science Tools and Techniques

IPL Prediction Using Data Science Tools and Techniques Using Support Vector (SVR), Random Forest and Linear Regression.

Uploaded by

plusinfo28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
44 views19 pages

IPL Prediction Using Data Science Tools and Techniques

IPL Prediction Using Data Science Tools and Techniques Using Support Vector (SVR), Random Forest and Linear Regression.

Uploaded by

plusinfo28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.

IPL Win Prediction Using Data Analytics

Win Probability Prediction in the Indian Premier League (IPL) using Machine Learning
and Big Data Analytics

By

Team Member 1: Saikishan Bharadwaj P

Team Member 2: Aarthi Jhabak

Team Member 3: Ria Singh

Team Member 4: K A Sampada

Team Member 5: Sukanta Dutta

Report

Submitted in partial fulfillment of the requirements for the

Business Analytics Program

Centre for Continuing Education

Indian Institute of Science

Bangalore – 560 012 India


IPL Win Prediction Using Data Analytics
2

Abstract

Sports analytics is a rapidly growing field used to analyse large datasets for predicting outcomes
and improving strategies. Cricket, especially the Indian Premier League (IPL), presents vast
opportunities for such analysis. This project focuses on IPL data from 2008 to 2017, following
its establishment by the Board of Control for Cricket in India (BCCI) in 2007. The dataset
includes ball by ball match details, player statistics, and records from 13 teams representing
different cities. To derive meaningful insights, the project applies modern techniques such as
statistical analysis, probability, machine learning (Support Vector Regression, Random Forest,
Linear Regression), and big data tools. These are used to identify key performance indicators,
evaluate player strengths and weaknesses, and predict match outcomes. Data visualization is
implemented using Power BI to create interactive dashboards, offering visual insights into team
and player performance. The integration of analytical methods with visualization tools enables a
comprehensive and data,driven understanding of IPL trends and strategies.

Keywords: Prediction, IPL, Machine Learning, PowerBI.


IPL Win Prediction Using Data Analytics
3

Table of Contents

Table of Contents

Abstract 2

Chapter I: Introduction 4
Problem Statement 4
Purpose of the Study 4
Research Question 4
Definition of Terms 4
Assumption of Limitation of Studies 4
Overview 4

Chapter II: Related Work 5


Introduction 5
Review of related Work 5
Summary 5

Chapter III: Method/Experiment 6


Introduction 6
Research Question 6
Data Pre Processing Feature Engineering and Visualization 6-12
Choice of Model 13
Training the Model, Performance of Model and Metrics 13-14
Overall Project and improvements and application of results 14-15
Summary 16

Chapter IV: Results 17


Introduction 17
Summary 17

Chapter V: Summary,Conclusion and Recommendations 18


Introduction 18
Summary 18
Conclusion 18
Recommendations 18

References 19
IPL Win Prediction Using Data Analytics
4

Chapter I: Introduction

ProblemStatement:
The Indian Premier League (IPL) generates vast amounts of complex data, yet strategic insights
from it remain underutilized. This study addresses the challenge of analyzing large,scale IPL
data to support decision,making for teams, analysts, and fans using data science techniques.

Purpose of the Study:


This study aims to analyse IPL data from 2008 to 2017 using machine learning models like
Random Forest and Linear Regression, alongside Power BI visualizations, to uncover trends,
predict outcomes, and evaluate performance strategies through interactive tools.

Research Questions:

1. What KPIs influence IPL match outcomes?


2. How accurately can machine learning models predict results?
3. How do player statistics vary across seasons and teams?
4. What patterns can be visualized to support strategic analysis?

Definition of Terms:

 IPL: India’s professional T20 cricket league.


 Machine Learning: AI,based pattern recognition and prediction.
 Random Forest: A tree, based ensemble prediction model.
 Linear Regression: A statistical method for binary classification.
 Power BI: A tool for interactive data visualization.
 Big Data: Large-scale datasets requiring advanced analytics.

Assumptions and Limitations:

The study assumes accurate and consistent IPL data from 2008–2017. It does not include data
beyond 2017 or external factors like weather and injuries, which may limit prediction accuracy.
External factors like pitch conditions, weather, player form, and in-game strategies are difficult
to quantify and often excluded from available datasets. Additionally, real-time data such as
player injuries, captain decisions, or psychological pressure is not captured, limiting model
awareness.

Overview:
This chapter outlined the motivation, goals, and scope of the study. The next chapter will review
related literature on machine learning, sports analytics, and visualization methods.
IPL Win Prediction Using Data Analytics
5

Chapter II: Related Work


Introduction

Over the years, the Indian Premier League (IPL) has evolved into one of the most data,rich
sporting events, providing researchers and analysts with an expansive dataset for exploring
patterns, trends, and performance indicators over by over. This chapter reviews past studies and
data analytics models focused on IPL data, including player performance prediction, match
outcome analysis, and visualization techniques using tools like Power BI and machine learning
models. By examining these works, we aim to understand the current state of IPL analytics and
identify gaps our study seeks to address.

Review of Related Work

Previous research has leveraged IPL datasets to build predictive models for match outcomes,
identify key performance indicators, and analyse team or player dynamics. Jaipurkar and Ragit
(2020) used Microsoft Power BI to visualize batting and bowling statistics, uncovering
performance trends across seasons. Other studies have applied machine learning models such as
Linear regression and random forests to predict the outcome of matches based on variables like
total runs scored, team composition, and player roles. Several works have focused on extracting
insights from structured data similar to the current dataset, which includes match ID, player
statistics, team performance, and roles (captain, player). However, many prior analyses are either
descriptive in nature or limited to only top, level team statistics. Few have integrated detailed
player, level data, including attributes like batting hand, bowling skill, and man of the match
awards, to predict performance or understand victory contributions. Sankaranarayanan et al.
(2014) developed a predictive model for IPL matches using machine learning algorithms,
including SVR. Their analysis showed that SVR could effectively capture the non-linear
relationships between match features such as player form, team composition, and toss decisions,
contributing to more accurate match outcome predictions compared to linear models. Kumar and
Sinha (2020) conducted a comparative study of machine learning models including SVR,
Random Forest, and XGBoost on IPL match data. Their findings showed that SVR, although
slightly more computationally intensive, provided more stable and smoother output trends over
overs, especially useful for real-time win probability prediction during live matches.

Summary

In summary, related literature highlights the growing relevance of IPL data analytics in sports
research and decision, making. Most studies either focused on team, level statistics or used basic
player data without integrating all available dimensions. Our study builds on this foundation by
incorporating a more granular dataset, including player demographics, skills, and match, specific
performance, to provide a deeper, data, driven understanding of what contributes to winning a
match. This fills a clear gap and supports more nuanced predictive modelling and visualization in
IPL analytics.
IPL Win Prediction Using Data Analytics
6

Chapter III: Method/Experiment

Introduction

This chapter describes the methodological approach used to analyze the IPL (Indian Premier
League) dataset and predict match outcomes based on player and team performance metrics. The
aim is to extract actionable insights and build predictive models. To outlines data preprocessing
techniques, feature engineering methods, visualization steps, model selection strategies, training
procedures, and performance evaluation metrics. The following sections cover below mentioned
criteria,

 Research Questions
 Data Pre-processing
 Feature Engineering and Visualization
 Choice of Model
 Training the Model
 Performance Evaluation and Metrics

Research Questions

[Link] are the critical performance indicators (e.g., runs scored, wickets, venue, over
progression) that significantly influence the outcome of IPL T20 matches?
[Link] machine learning algorithms effectively predict a team’s winning probability at
different phases of an IPL match using real-time ball-by-ball data?
[Link] does the inclusion of contextual features (such as venue-based batting
performance or toss results) improve prediction accuracy in IPL match outcome
modeling?
[Link] machine learning model offers the best trade-off between accuracy,
interpretability, and computational efficiency for IPL match outcome prediction?

Data Preprocessing, Feature Engineering and Visualization


A. Data Pre-Processing:
Data pre-processing formed the foundation of the IPL win probability model by converting raw
datasets into a clean, structured format suitable for machine learning. Initially, match,level and
ball by ball delivery data were loaded into DataFrames. The datasets were inspected using
functions like .info() and .head() to understand their structure and value distributions. During
cleaning, matches with missing winner data were removed to retain only completed games, and
duplicate or irrelevant records were eliminated. To ensure consistency, column names and
formats were standardized, particularly for critical identifiers such as team names, dates, and
venue fields, enabling seamless data merging and analysis. Below Screenshot depicts the
processed data
IPL Win Prediction Using Data Analytics
7

B. Exploratory Data Analysis

EDA played a vital role in the IPL win probability modeling by uncovering patterns, detecting
anomalies, and guiding effective feature engineering. Key insights revealed that powerplay overs
generally have steady run rates with fewer wickets, while end overs show rapid scoring and
frequent dismissals. Required run rate trends upward when scoring lags and drops sharply after
high, scoring overs, especially in chases. Features like current runs, wickets lost, and recent over
momentum emerged as strong predictors of win probability. Team and venue, specific
performance patterns were evident, with certain teams excelling at home grounds. Outliers, such
as unusually high or low scoring overs, highlighted rare match events. Overall, match
progression followed a clear pattern,slow starts, stable mid,innings, and aggressive finishes, with
win probabilities shifting significantly after pivotal moments like big overs or crucial wickets.
Below are some plots on EDA.
1. Top 10 Winning Teams and Top 10 Batsman

2. Number of Wins for Most Winning Team at Each Venue


IPL Win Prediction Using Data Analytics
8

3. Distribution of Runs Scored Per Over (1st and 2nd Innings)

4. Total Wickets Lost Per Over (1st and 2nd Innings)

5. Average Runs Per Ball by Over and Wickets Lost (2nd Innings)
IPL Win Prediction Using Data Analytics
9

C. Feature Engineering

Feature engineering is essential in any predictive modeling project because it transforms raw
data into meaningful variables that better capture the patterns, relationships, and context relevant
to the prediction target. Following is the table and data included in feature engineering along
with screenshot representing the output.

Feature Name Description


runs_so_far Total runs scored by batting team up to the current over
wickets Wickets lost so far
runs_remaining Runs left to win (in the second innings only)
balls_remaining Balls left in the innings (second innings only)
current_run_rate Runs per over scored till this point
required_run_rate Remaining runs per over needed to win
momentum_runs Runs scored in the last 18 balls (3 overs) for recent momentum
momentum_wickets Wickets lost in the last 18 balls, measure of recent pressure
venue_win_rate Historical win rate for the batting team at the venue
team_batting_name Name of the current batting team (categorical)
team_bowling_name Name of the bowling team (categorical)
venue Match venue (categorical)
toss_winner Team that won the toss (categorical)

1. [Link] delivery information for T20 cricket matches.

[Link] Proportion and Preprocess.

3. Venue,wise Batting Win Rate Feature


IPL Win Prediction Using Data Analytics
10

D. Visualization-Over by Over Predictions –Plot Predictions

The left plot shows average win probability over each over for IPL teams when batting. Stronger
teams like Mumbai Indians and Chennai Super Kings maintain higher win probabilities across
the innings, while teams like Gujarat Lions show consistently lower values. The right plot
highlights the average prediction error, which is highest during the initial (1–3) and final overs
(18–20) due to greater game volatility. The model performs most accurately in the middle overs
(4–17), making this period the most reliable for win predictions.

Power-BI Dashboard Insights

The Power BI dashboard offers a detailed visual analysis of IPL match data, highlighting key
patterns and predictive insights. It effectively compares actual and predicted match outcomes,
demonstrating strong model accuracy. The dashboard presents team and player statistics,
including runs, wickets, and strike rates, alongside batting and bowling trends such as scoring
patterns and boundary frequencies. It also explores win patterns influenced by toss decisions and
match venues. Interactive filters allow users to analyse data by season, team, or player, while
charts and graphs illustrate player contributions and over,wise performance. Overall, it combines
prediction and performance analytics to support strategic decision,making in IPL. The Power BI
analysis reveals high prediction accuracy in forecasting IPL match outcomes, with teams like
Mumbai
Indians and Chennai Super Kings showing consistent performance. It highlights key player
metrics such as runs, wickets, and strike rates, along with the impact of toss decisions and venue
conditions on match results. Over,by,over predictions provide dynamic insights, while boundary
patterns and win margins offer deeper understanding of team strategies. Overall, the dashboard
effectively combines predictive analytics and visual tools to support strategic decision, making in
the IPL. Below is dashboard representating various insights.

IPL Dashboard-Match_Summary
IPL Win Prediction Using Data Analytics
11

IPL Dashboard- Match_Summary

IPL Dashboard-Player_Level_Summary

IPL Dashboard-Team_Level_Summary

IPL Dashboard-Model_Results

IPL Dashboard-Comparative_Analysis
IPL Win Prediction Using Data Analytics
12

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is a key step in preparing data for modeling. It starts with
loading the dataset and understanding its structure using tools like .shape and .info(). Next, data
cleaning is performed by handling missing values, removing duplicates, and correcting data
types. Univariate analysis explores individual variables through statistics and visualizations like
histograms or bar plots. Bivariate analysis examines relationships between variables using scatter
plots and correlation metrices. Feature engineering follows, involving the creation or
transformation of variables for better model performance. Finally, key insights, patterns, and
anomalies are summarized to guide further analysis.

Choice of Model

Support Vector Regression (SVR):

Support Vector Regression (SVR) with a radial basis function (RBF) kernel was chosen as the
primary model for predicting IPL win probabilities due to its strong performance and suitability
for the problem and was final selected model. SVR provided smooth and realistic probability
curves throughout the progression of a match, effectively modeling the nonlinear relationships
between key features such as momentum, run rates, and wicket falls. Compared to other
regression models, SVR demonstrated superior generalization on test data, as indicated by higher
R² scores and lower mean squared error, making it the most effective choice for this predictive
task. The below represents other used models, notes their respective limitation.
Model Training

Model Notes Limitations Noted


Random Good baseline; handled nonlinearities, but
Forest produced less smooth predictions and was prone Inconsistent probability
Regressor to overfitting. curves.
Ridge/Lasso Could not capture complex
(Linear Linear baseline models for comparison; underfit trends; very poor accuracy
Models) the data. (negative R², high RMSE).
XGBoost Strong performance on many metrics; complex Model interpretability and
Regressor and required careful tuning to prevent overfitting. tuning complexity.

Why SVR Outperformed


Provided the most stable and interpretable win probability progression over [Link]
predictive power against over/underfitting, with manageable complexity and good
[Link] robust results without excessive parameter tuning.
IPL Win Prediction Using Data Analytics
13

Limitations of the Chosen Model


Computationally Intensive: SVR can be slower to train, especially on large datasets, due to
kernel computations. Requires Feature Scaling: Sensitive to unscaled input features,
necessitating careful preprocessing. Limited Interpretability: While less opaque than some
ensemble models, the influence of specific features on predictions is less directly interpretable
than in linear models.

Conclusion

SVR was chosen for its reliable, smooth predictions and adaptability to the dynamic, nonlinear
progressions of a T20 cricket match. Although other models were considered and tested, SVR
consistently produced the most credible forecasts for over-by-over win probability, making it
best suited for the requirements of this project.

Training the Model, Performance of Model and Metrics

To ensure robust model performance and unbiased evaluation, the dataset was divided into
training, validation, and test sets. The training set was used to fit the model, the validation set
guided hyperparameter tuning and model selection, while the test set provided a final, unbiased
performance assessment. A preprocessing pipeline was implemented, combining Standard Scaler
for numeric features and One Hot Encoder for categorical variables, ensuring consistent
transformations across all data splits. Support Vector Regression (SVR) with an RBF kernel was
selected for its ability to capture nonlinear match dynamics and generate smooth, realistic win
probability estimates. Hyperparameter tuning was conducted using Randomized Search with
cross,validation on the validation set. The key parameters optimized included C (regularization
strength), epsilon (insensitivity margin), and gamma (RBF kernel coefficient). The best
combination found was {'kernel': 'rbf', 'gamma': 'scale', 'epsilon': 0.01, 'C': 100}. This structured
train,validation,test approach and targeted tuning ensured strong generalization, reduced
overfitting, and reliable win probability predictions throughout the course of IPL matches.

Performance of Model and Metrics


IPL Win Prediction Using Data Analytics
14

The final SVR model demonstrated strong performance on the test set following hyperparameter
tuning. It achieved an R² score of approximately 0.945, indicating that the model was able to
explain a high proportion of variance in the win probability predictions. The Mean Squared Error
(MSE) was around 1.38, reflecting a very low average squared difference between predicted and
actual values. Additionally, the Root Mean Squared Error (RMSE) was about 11.7%, suggesting
a low average error magnitude and confirming the model’s reliability in estimating win
probabilities throughout IPL match progressions.
Instruments Used: These are the key tools/libraries typically used for model training and
evaluation:

Instrument/Tool Purpose
Python Programming language
Scikit,learn ML model training, evaluation, metrics
Pandas / NumPy Data preprocessing & handling
Matplotlib / Seaborn Visualizing metrics like confusion matrix
Jupyter Notebook, Interactive code development
Colab
XGBoost Gradient boosting models

Overall project and Improvements and Applications and Results

This project presents a complete analytical and predictive framework for IPL T20 matches using
historical ball by balldata. It combines thorough EDA, intelligent feature engineering, and robust
machine learning techniques to forecast match outcomes. Key improvements include the addition
of contextual features like venue,based batting win rate, standardization of inputs, and fine,tuned
SVR modelling. This chapter outlines the methodological framework used to analyse IPL (Indian
Premier League) match data and develop predictive models based on player and team
performance. The key steps include data pre,processing, feature engineering, and exploratory
data analysis (EDA), supported by effective visualization techniques. Initially, the raw data is
processed to calculate essential metrics such as runs scored and wickets lost per over.

Results

The EDA section highlights several insights: the top 10 most successful teams include
franchises like Mumbai Indians and Chennai Super Kings, while top batsmen such as Virat Kohli
and David Warner consistently perform well. Further analysis shows how team performance
varies across venues, indicating that some teams have a strong home advantage. Box plots of run
distributions per over reveal that the later overs, particularly the death overs, see higher scoring
rates. Conversely, the number of wickets lost tends to increase in the middle and final overs,
reflecting the pressure of acceleration phases. A heatmap illustrating average runs per ball based
on overs and wickets lost shows that teams score more effectively when they preserve wickets,
IPL Win Prediction Using Data Analytics
15

especially during chases. Finally, feature engineering focuses on extracting insights from ball by
ball delivery data, which provides a granular and dynamic foundation for building accurate
match outcome prediction models. Overall, the approach combines statistical analysis with
machine learning readiness, enabling deeper understanding and predictive capability in T20
cricket.

Feature Engineering analysis presents a complete machine learning pipeline designed to predict
win probabilities in T20 cricket matches using ball by ball delivery data. The study begins by
utilizing granular match data, capturing key elements such as overs, runs, wickets, teams, and
venue information. Initial preprocessing includes calculating the proportion of matches won
(55.2%) and lost (44.8%) and applying appropriate data transformations. Numerical features are
standardized using StandardScaler, while categorical variables are encoded
through OneHotEncoder within a ColumnTransformer [Link] important engineered feature
added to the dataset is the venue,wise batting win rate, which accounts for pitch or
ground,specific advantages. This significantly aids in enhancing model accuracy. Several
machine learning models were then trained, including Linear Regression, Random Forest,
XGBoost, Ridge, Lasso, and Support Vector Regression (SVR). Among these, SVR delivered
the highest performance with an R² score of 0.9447 and the lowest RMSE (11.74) during
[Link] further improve the model, hyperparameter tuning was performed
using RandomizedSearchCV. Again, SVR emerged as the top performer with optimal parameters
like C=100, epsilon=0.01, and kernel='rbf'. On the test dataset, SVR maintained its superiority,
achieving an R² score of 0.9478, RMSE of 13.0, and MAE of 11.4. The model's average
predicted win percentage stood at 52.98%, with sample predictions closely aligning with real
outcomes. Overall, the project demonstrates a robust application of machine learning in sports
analytics, particularly in forecasting match outcomes with impressive accuracy.

Methodological limitations or procedural weaknesses.


IPL Win Prediction Using Data Analytics
16

While the model delivered strong predictive performance, several limitations were noted. The
feature set was limited, lacking player,specific data, pitch conditions, weather influences, and
real,time contextual factors that could enhance accuracy. The chosen SVR model, although
effective, required careful hyperparameter tuning and sometimes produced overconfident
predictions at the extremes. Data quality posed challenges, including risks of missing or
inaccurate entries and potential data leakage. Additionally, the model could be prone to
overfitting historical trends, leading to performance degradation as IPL dynamics and team
strategies evolve. Lastly, the use of over,by,over resolution smoothed out ball,level variability,
potentially missing critical in,game fluctuations that could affect win probabilities.

Summary

The methodology involves collecting and preprocessing historical IPL data, engineering key
match features, and applying machine learning models like Support Vector Regression.
Hyperparameter tuning on validation sets optimizes model parameters for best performance,
yielding smooth, accurate over,by,over win probability predictions. The model’s strength lies in
capturing nonlinear match dynamics with good generalization verified on test [Link]
applications include live match analytics, strategic coaching decisions, and fantasy sports
insights. Main limitations are missing player,level and contextual features, potential
overconfidence in probabilities, and evolving IPL dynamics that require ongoing model updates.
Summary of the Methodology

The study followed a structured and systematic methodology, collecting data through surveys,
interviews, experiments, secondary sources from a defined sample using methods like random,
stratified, purposive sampling to ensure representation. Data analysis was conducted using
suitable tools, with statistical methods for quantitative data and thematic coding for qualitative
insights. The approach was aligned with the research objectives to ensure reliable and valid
results.
IPL Win Prediction Using Data Analytics
17

Chapter IV: Results

Introduction

This study aims to develop a predictive model that estimates the win probability of the batting
team at each over in IPL T20 matches. Using historical match data, the project involves thorough
data preprocessing, feature engineering, and model selection. Key match dynamics,such as runs,
wickets, required run rate, momentum, and venue effects, were captured. Support Vector
Regression (SVR) was chosen for its ability to handle nonlinear relationships and deliver
smooth, realistic predictions. The model, optimized through scaling, encoding, and
hyperparameter tuning, showed high accuracy on test data, offering valuable insights into match
progression.

Summary

This project presents a comprehensive analytical and predictive framework for IPL T20 matches
using historical ball by ball data. It involves systematic steps such as data preprocessing,
exploratory data analysis (EDA), and intelligent feature engineering, supported by effective
visualizations. Key insights from EDA highlight the top teams as Mumbai Indians, Chennai
Super Kings, consistent batsmen as Virat Kohli, David Warner. Scoring patterns in death overs,
and venue,based performance trends. Feature engineering includes contextual metrics like
venue,wise batting win rate and dynamic match features, improving model accuracy. Various
models were tested, with Support Vector Regression (SVR) outperforming others achieving a
test R² of 0.9478 and RMSE of 13.0 after hyperparameter tuning. Support Vector Regression
(SVR) achieved an R² score of approximately 0.945 on the test set, indicating that the model
explains 94.5% of the variance in match outcomes. The model also attained a low mean squared
error (MSE) of around 1.38 and a root mean squared error (RMSE) of about 11.7%, reflecting
precise probabilistic predictions. These results indicate that the model reliably captures complex
match dynamics such as runs scored, wickets lost, and momentum shifts. The over,by,over win
probabilities generated are smooth and realistic, making the model suitable for live match
analysis, strategic decision support, and fan engagement applications. Overall, the model’s high
accuracy and interpretability validate its effectiveness as a tool for predicting IPL match
outcomes in real time.
IPL Win Prediction Using Data Analytics
18

Chapter V: Summary, Conclusions, and Recommendations

Introduction:

This paper presents a machine learning approach to predict the win probability of the batting
team at each over in Indian Premier League (IPL) cricket matches. Beginning with
comprehensive data collection and preprocessing of ball-by-ball IPL datasets, the study
engineered key features capturing match state, momentum, and venue effects. Various models
were evaluated, with Support Vector Regression (SVR) chosen for its ability to model the
nonlinear dynamics of cricket. Through rigorous hyperparameter tuning and validation, the
model was optimized to provide accurate and interpretable over-by-over win probabilities.
Summary

This study presents a complete analytical and predictive framework for IPL T20 cricket using
detailed historical ball by ball data. The approach involved data preprocessing, exploratory data
analysis (EDA), feature engineering, and machine learning model development. The optimized
SVR model achieved strong predictive performance, with an R² score around 0.945, mean
squared error near 1.38, and root mean squared error of approximately 11.7%. When applied as a
binary classifier using a 0.5 probability threshold, it delivered about 83.5% accuracy in
predicting match winners. The model’s probabilities aligned well with critical match events such
as wickets lost and scoring bursts, producing smooth and realistic win probability curves. Feature
engineering introduced impactful variables, including venue, wise batting win rate, enhancing
the model’s contextual accuracy. Multiple regression models were trained, and Support Vector
Regression (SVR) outperformed all others with a test R² of 0.9478 and RMSE of 13.0. The
model showed strong alignment between predicted and actual win probabilities, making it
suitable for real, time match forecasting.
Conclusions

The project successfully demonstrated the use of advanced data analytics and machine learning
in sports prediction. The integration of detailed match, level data with engineered features
significantly improved the model’s predictive power. SVR proved to be the most effective
model, accurately capturing match momentum and outcome probabilities. The analysis supports
that preserving wickets and exploiting favourable venues are key to success in T20 cricket. This
work highlights the importance of contextual features and robust tuning in enhancing model
performance.

Recommendations

For future improvements, it is recommended to include additional real, time variables such as
toss results, player injuries, weather conditions, and pitch behaviour to further enhance
prediction accuracy. Incorporating unstructured data sources like commentary or player
sentiment may provide deeper insights. Moreover, deploying the model into a real, time
dashboard or app can expand its application for coaches, analysts, broadcasters, and fans.
Continuous model updates with the latest match data will ensure sustained relevance and
accuracy in dynamic cricket environments.
IPL Win Prediction Using Data Analytics
19

References

[Link]
[Link]
[Link]
367479261_Artificial_Intelligence_and_Data_Analytics_in_Cricket
[Link]
[Link]
[Link]
[Link]
IPL,Data,Analysis,and,Visualization,Using,Microsoft,Jaipurkar,Ragit/
795cba7ef0772ba9088d1f4744ba3f36adf6b10c#paper,topic
[Link]
fin_irjmets1724734843.pdf
[Link]
[Link]
machine-learning-approach-f4641670c5bb
[Link]
[Link]
LEARNING-3-2
[Link]
learning-approach-f4641670c5bb
[Link]
LEARNING-3-2
[Link]
[Link]
[Link]
play_A_Data_Mining_Approach_to_ODI_Cricket_Simulation_and_Prediction

You might also like