ST. XAVIER’S SENIOR SEC.
SCHOOL,
JAIPUR
INFORMATICS PRACTICES PROJECT
IPL Player Performance Analysis
Submitted in partial fulfillment of the requirements for the
Senior Secondary Examination (AISSCE)
By:
Name: Anant Acharya
Class: 12 E
Roll No:
Session: 2024-25
C E RT I F I C AT E
This is to certify that Anant Acharya of Class XII E has successfully completed
the Informatics Practices project entitled IPL Player Performance Analysis
for the academic year 2024-25 under my supervision.
Place: Jaipur Signature of the Internal Supervisor
Date: Name: Nimmi Sam
Place: Jaipur Signature of the External Supervisor
Date: Name:
ID-SHEET
ROLL NO.
NAME OF STUDENT Anant Acharya
CONTACT NO. 8290375558
INTERNAL SUPERVISOR Mrs. Nimmi Sam
PROJECT TITLE IPL Player Performance Analysis
FRONT END PyCharm
PROGRAMMING
Python
LANGUAGE
DATA SOURCE CSV file
ACKNOWLEDGMENT
I take this opportunity to thank Rev. Fr. Principal M. Arockiam for providing
all the facilities required to carry out my project.
I would like to express my sincere gratitude to my supervisor Mrs. Nimmi Sam
for helping me develop the project and also for her constant encouragement
towards becoming more professionally qualified.
LANGAGUAGE SPECIFICATION
This project has been developed using the PYTHON programming language.
Data science is an essential part of any industry in this era of big data. Data
science is a field that deals with vast volumes of data using modern tools and
techniques to derive meaningful information, make business decisions and for
predictive analysis. The data used for analysis can be from multiple sources and
in various formats.
Python is the most sought-after programming language today among data
professionals for data analysis.
Python provides all the necessary tools to analyse huge datasets. It comes with
powerful statistical, numerical and visualisation libraries such as Pandas,
Numpy, Matplotlib etc. and many advanced libraries also.
INTRODUCTION
In today’s fast-growing world, information has a vital and essential role to play. The
IT revolution has not only affected business, education, science and technology but
also the way people think. Speedy changes in the economy and globalization are
putting more and more stress on cutting-edge technology and processing information
swiftly, accurately and reliably. The conventional system was not capable to show
accuracy and speed.
Thus it has been replaced by a computer-based system that is reliable, accurate,
secure, versatile and efficient enough to process information speedily. The computer-
based system has proved revolutionary in satisfying the basic needs of today’s
modern business world – quick availability, processing and analysis of information.
Problems with the conventional (Manual) system:
1. Lack of immediate information retrieval.
2. Lack of immediate information storage.
3. Lack of prompt updating of transactions.
4. Lacks sorting of information.
5. Redundancy of information.
6. Time and efforts required to generate accurate and prompt reports is high.
Need and benefits of computerisation
1. To make the information available accurately and speedily.
2. To minimise the burden of paper documents.
SCOPE OF THE PROJECT
This project is designed as a comprehensive analytical tool for exploring and visualizing
statistics from the Indian Premier League (IPL). The goal is to provide cricket enthusiasts,
analysts, and fans with a structured, user-friendly interface for accessing a variety of insights
about player performances and match data. By utilizing three datasets—player statistics, match
details, and ball-by-ball deliveries—the project integrates a vast amount of information to offer
both detailed statistical summaries and insightful visualizations.
The tool supports several key functionalities. Users can retrieve detailed individual player
statistics, including batting and bowling averages, strike rates, and fielding contributions like
catches and stumpings. It also allows for the identification of top players in various categories
such as runs scored, wickets taken, strike rates, and economy rates. For in-depth analysis, users
can compare multiple players' performances side-by-side or examine trends in a specific
batsman's runs or a bowler's dismissals over matches. Additionally, users can analyze top run-
scorers across different IPL seasons.
The implementation relies on Python's powerful data analysis libraries. The pandas library is
used for efficient data processing and manipulation, while matplotlib provides capabilities for
creating clear, insightful visualizations. The tool also employs techniques for handling and
validating datasets, ensuring robust and accurate analysis.
By automating complex statistical operations and offering intuitive visual representations, this
project streamlines the exploration of IPL data, making it an invaluable resource for anyone
interested in the intricacies of cricket analytics. Whether you're analyzing past performances or
comparing players, this tool makes it effortless to derive meaningful insights from IPL statistics.
DATA SOURCE
CSV File name: deliveries.csv, IPL Player Stat.csv, matches.csv
I MPLEMENTATION
SOURCE CODE
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from typing import Dict, List
class IPLStatsAnalyzer:
def __init__(self, stats_file: str, matches_file: str, deliveries_file:
str):
"""Initialize the IPL Stats Analyzer with the datasets"""
self.stats = pd.read_csv(stats_file)
self.matches = pd.read_csv(matches_file)
self.deliveries = pd.read_csv(deliveries_file)
self._verify_columns()
def _verify_columns(self):
"""Verify that all required columns are present"""
required_columns_stats = {
'player', 'runs', 'boundaries', 'balls_faced', 'wickets',
'balls_bowled', 'runs_conceded', 'matches', 'batting_avg',
'batting_strike_rate', 'bowling_economy', 'bowling_avg',
'bowling_strike_rate', 'catches', 'stumpings'
}
required_columns_matches = {'id', 'season', 'winner', 'team1',
'team2'}
required_columns_deliveries = {'match_id', 'batter', 'bowler',
'batsman_runs', 'dismissal_kind'}
missing_stats = required_columns_stats - set(self.stats.columns)
missing_matches = required_columns_matches - set(self.matches.columns)
missing_deliveries = required_columns_deliveries -
set(self.deliveries.columns)
if missing_stats:
raise ValueError(f"Missing columns in stats dataset:
{missing_stats}")
if missing_matches:
raise ValueError(f"Missing columns in matches dataset:
{missing_matches}")
if missing_deliveries:
raise ValueError(f"Missing columns in deliveries dataset:
{missing_deliveries}")
def get_player_stats(self, player_name: str) -> Dict:
"""Get comprehensive stats for a player"""
player_data = self.stats[self.stats['player'].str.lower() ==
player_name.lower()]
if len(player_data) == 0:
return {"error": f"No data found for player: {player_name}"}
stats_dict = player_data.iloc[0].to_dict()
return {
'name': stats_dict['player'],
'matches': int(stats_dict['matches']),
'runs': int(stats_dict['runs']),
'batting_avg': round(stats_dict['batting_avg'], 2),
'batting_strike_rate': round(stats_dict['batting_strike_rate'],
2),
'boundaries': int(stats_dict['boundaries']),
'wickets': int(stats_dict['wickets']) if not
pd.isna(stats_dict['wickets']) else 0,
'bowling_economy': round(stats_dict['bowling_economy'], 2) if not
pd.isna(
stats_dict['bowling_economy']) else 0,
'catches': int(stats_dict['catches']) if not
pd.isna(stats_dict['catches']) else 0,
'stumpings': int(stats_dict['stumpings']) if not
pd.isna(stats_dict['stumpings']) else 0
}
def get_top_players(self, category: str, limit: int = 10) -> pd.DataFrame:
"""Get top players in various categories"""
if category == 'runs':
return self.stats.nlargest(limit, 'runs')[
['player', 'runs', 'batting_avg', 'batting_strike_rate',
'matches']]
elif category == 'wickets':
return self.stats.nlargest(limit, 'wickets')[
['player', 'wickets', 'bowling_economy', 'bowling_avg',
'matches']]
elif category == 'batting_strike_rate':
return self.stats[self.stats['balls_faced'] >=
100].nlargest(limit, 'batting_strike_rate')[
['player', 'batting_strike_rate', 'runs',
'boundaries_percent', 'matches']]
elif category == 'bowling_economy':
return self.stats[self.stats['balls_bowled'] >=
100].nsmallest(limit, 'bowling_economy')[
['player', 'bowling_economy', 'wickets', 'bowling_avg',
'matches']]
else:
raise ValueError(f"Invalid category: {category}")
def plot_player_comparison(self, player_names: List[str]):
"""Compare multiple players' statistics"""
players_data = self.stats[self.stats['player'].isin(player_names)]
if len(players_data) == 0:
print("No data found for the specified players.")
return
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
# Batting Stats
players_data.plot(kind='bar', x='player', y='batting_avg', ax=ax1,
color='skyblue')
ax1.set_title('Batting Average Comparison')
ax1.tick_params(axis='x', rotation=45)
players_data.plot(kind='bar', x='player', y='batting_strike_rate',
ax=ax2, color='lightgreen')
ax2.set_title('Batting Strike Rate Comparison')
ax2.tick_params(axis='x', rotation=45)
# Bowling Stats (if applicable)
bowling_data = players_data[players_data['wickets'] > 0]
if len(bowling_data) > 0:
bowling_data.plot(kind='bar', x='player', y='bowling_economy',
ax=ax3, color='salmon')
ax3.set_title('Bowling Economy Comparison')
ax3.tick_params(axis='x', rotation=45)
bowling_data.plot(kind='bar', x='player', y='bowling_avg', ax=ax4,
color='orange')
ax4.set_title('Bowling Average Comparison')
ax4.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
def plot_batsman_performance(self, batsman_name: str):
"""Analyze the performance of a specific batsman"""
batsman_data = self.deliveries[self.deliveries['batter'] ==
batsman_name]
if batsman_data.empty:
print(f"No data found for batsman: {batsman_name}")
return
batsman_grouped = batsman_data.groupby('match_id')
['batsman_runs'].sum()
plt.figure(figsize=(12, 6))
batsman_grouped.plot(kind='bar', color='blue', alpha=0.7)
plt.title(f"{batsman_name}'s Runs per Match")
plt.xlabel("Match ID")
plt.ylabel("Runs")
plt.show()
def plot_bowler_performance(self, bowler_name: str):
"""Analyze the performance of a specific bowler"""
bowler_data = self.deliveries[self.deliveries['bowler'] ==
bowler_name]
if bowler_data.empty:
print(f"No data found for bowler: {bowler_name}")
return
dismissals =
bowler_data['dismissal_kind'].notna().groupby(bowler_data['match_id']).sum()
plt.figure(figsize=(12, 6))
dismissals.plot(kind='bar', color='green', alpha=0.7)
plt.title(f"{bowler_name}'s Dismissals per Match")
plt.xlabel("Match ID")
plt.ylabel("Dismissals")
plt.show()
def plot_top_run_scorers_by_season(self):
"""Analyze top run-scorers for each season"""
merged_data = pd.merge(self.deliveries, self.matches,
left_on='match_id', right_on='id')
season_runs = merged_data.groupby(['season', 'batter'])
['batsman_runs'].sum().reset_index()
top_scorers = season_runs.groupby('season').apply(lambda x:
x.nlargest(1, 'batsman_runs')).reset_index(drop=True)
plt.figure(figsize=(12, 6))
for season in top_scorers['season'].unique():
season_data = top_scorers[top_scorers['season'] == season]
plt.bar(season_data['season'], season_data['batsman_runs'],
label=season_data['batter'].values[0])
plt.legend(title="Top Scorers")
plt.title("Top Run Scorers by Season")
plt.xlabel("Season")
plt.ylabel("Runs")
plt.show()
def main():
try:
analyzer = IPLStatsAnalyzer('IPL Player Stat.csv', 'matches.csv',
'deliveries.csv')
while True:
print("\n=== IPL Stats Analysis Tool ===")
print("1. Player Statistics")
print("2. Top Players by Category")
print("3. Compare Players")
print("4. Batsman Performance")
print("5. Bowler Performance")
print("6. Top Run Scorers by Season")
print("7. Exit")
choice = input("\nEnter your choice (1-7): ")
if choice == '1':
player = input("Enter player name: ")
stats = analyzer.get_player_stats(player)
if "error" in stats:
print(f"\n{stats['error']}")
else:
print(f"\nStatistics for {stats['name']}:")
print(f"Matches Played: {stats['matches']}")
print(f"Runs Scored: {stats['runs']}")
print(f"Batting Average: {stats['batting_avg']}")
print(f"Strike Rate: {stats['batting_strike_rate']}")
print(f"Boundaries: {stats['boundaries']}")
if stats['wickets'] > 0:
print(f"Wickets: {stats['wickets']}")
print(f"Bowling Economy: {stats['bowling_economy']}")
print(f"Catches: {stats['catches']}")
print(f"Stumpings: {stats['stumpings']}")
elif choice == '2':
print("\nCategories:")
print("1. Most Runs")
print("2. Most Wickets")
print("3. Best Strike Rate")
print("4. Best Economy Rate")
category_choice = input("Choose category (1-4): ")
if category_choice == '1':
print("\nTop Run Scorers:")
print(analyzer.get_top_players('runs'))
elif category_choice == '2':
print("\nTop Wicket Takers:")
print(analyzer.get_top_players('wickets'))
elif category_choice == '3':
print("\nBest Strike Rates (min 100 balls):")
print(analyzer.get_top_players('batting_strike_rate'))
elif category_choice == '4':
print("\nBest Economy Rates (min 100 balls):")
print(analyzer.get_top_players('bowling_economy'))
elif choice == '3':
players = input("Enter player names (comma-separated):
").split(',')
players = [p.strip() for p in players]
analyzer.plot_player_comparison(players)
elif choice == '4':
batsman = input("Enter batsman name: ")
analyzer.plot_batsman_performance(batsman)
elif choice == '5':
bowler = input("Enter bowler name: ")
analyzer.plot_bowler_performance(bowler)
elif choice == '6':
analyzer.plot_top_run_scorers_by_season()
elif choice == '7':
print("Exiting the tool. Goodbye!")
break
else:
print("Invalid choice. Please try again.")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
main()
S AMPLE O UTPUTS
Main Menu
Player Statistics
Top Players By Category
Batsman Performance
Bowler Performance
Top Scorers by season:
BIBLIOGRAPHY
Informatics Practices Text Book (NCERT)
Informatics Practices by Sumita Arora
docs.python.org