Name: Sachin Datta Shinde
Project Report On
Predicting Total Runs in a Cricket Match Using Simple Linear Regression
Objectives:
• To build a machine learning model that predicts the total runs scored by a cricket team
using the average runs per batter from the past 5 matches.
• To apply Simple Linear Regression for understanding the linear relationship between
average batter performance and team total score.
Methodology:
1. Dataset Used:
The dataset consists of past match records between India and Australia, including
features like average runs per batter and the total runs scored in those matches.
2. Data Preprocessing:
• Data is loaded using Pandas from a CSV file
.• The input feature (avg_runs_per_batter) is normalized to ensure efficient gradient
descent convergence
.• A bias term (intercept) is added to the feature matrix.
3. Model Building:
• A Simple Linear Regression model is implemented using NumPy.
• The model is trained using Gradient Descent for 1000 epochs with a learning rate of
0.01.
• The cost function used is Mean Squared Error (MSE).
• The model parameters (theta) are updated iteratively to minimize the cost function.
4. Evaluation:
• The cost is printed at every 100 epochs to show how the model converges.• The final
model is used to predict the total team score based on a new input of average batter
performance.
Key Features :
• Implementation of a custom linear regression model from scratch using NumPy
• Normalization of features to improve training efficiency.
• Real-time cost visualization during training.
• Function to predict total score based on any given average batter input.
Results
• The model successfully reduced the cost function from 36950.75 to 0.20 over 1000
epochs.
• A sample prediction for an input of avg_runs_per_batter = 29.5 resulted in: Predicted
total team score: 294.25
• Sample input-output mapping:
Output:
Epoch 0: Cost = 36950.75
Epoch 100: Cost = 4950.82
Epoch 200: Cost = 663.48
Epoch 300: Cost = 89.07
Epoch 400: Cost = 12.11
Epoch 500: Cost = 1.80
Epoch 600: Cost = 0.41
Epoch 700: Cost = 0.23
Epoch 800: Cost = 0.20
Epoch 900: Cost = 0.20
Predicted total team score: 294.25
Conclusion:
This project successfully demonstrates the application of Simple Linear Regression in
predicting cricket scores based on past batter performance. It highlights key machine learning
concepts such as gradient descent, cost function minimization, and feature scaling. With
further improvements like incorporating more features (e.g., pitch type, opposition bowling
average), the model's accuracy and applicability could be enhanced.
Sachin Datta Shinde (257) Rucha A. Gurav
(submitted by) (Guide)