Report 730

This project develops an R script to predict soccer league standings using various statistical models based on key performance metrics. Four predictive approaches were employed: Expected Goals, Linear Regression, Random Forest, and XGBoost, with Linear Regression showing the best balance of accuracy and reliability. The study concludes that while predictive modeling offers insights, the unpredictable nature of soccer limits prediction accuracy.

Uploaded by

rochan peechara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views5 pages

Report 730

Uploaded by

rochan peechara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

We created an R script that automatically collects data from a website, focusing on

specific soccer statistics we believe are good indicators of a team's league standing.
We aim to predict which teams will end up at the top of the league table and which
might not perform as well. This method saves time and allows us to gather and
analyze large amounts of data that we think are most relevant to team
performance.
For the purpose of accounting for all aspects of the game of soccer we decided to
collect vital stats of every team in 3 important aspects : Goalkeeping, Defending
and Attack. Out of over 100 available features we narrowed it down to 27 removing
correlated or redundant stats. We only used 2019 , 2022 & 2023 data. Using 2019
and 2022 for training and 2023 for prediction.

Prediction by Linear Regression

We applied a Linear Regression model to start with. Using the formula , Model <-
lm(Pts <- . –Squad, data = train_data RMSE : 9.05 We then got the predictions for
points and their predicted intervals. From among the 20 possible predictions, the
model predicted 15 teams accurately between its lower and upper bounds.

Results:
Our final goal is actually to predict the standings and they can be reliably done by
predicted the points and ordering them.
Highlights:
Positions Guessed Correctly : 6 (highest)
Mean Position Difference : 1.6 (lowest)
Mean Points Difference : 7.4
Points predicted between the upper and lower bound: 15

Prediction by Random Forest

We then opted to using Random Forest as our 2 nd model. The parameters being,
500 trees and 8 variables split at each node.
For points prediction, it did really well as the points for 19 of the 20 squads were
predicted to be in upper and lower bounds.
Results:
After taking the predicted values and ordering it, we compared them against the
actual results.
Highlights:
Positions Guessed Correctly : 5
Mean Position Difference : 2.3
Mean Points Difference : 7.3
Teams that scored points within the Prediction Interval : 19 (Highest)

Prediction by xG Boosting method

After using Extreme Gradient Boosting method for the prediction of points scored in
a league season, it also performed really well in terms of points prediction.
And after giving it prediction intervals, we could see it also gave 19 out of the 20
values in its upper and lower bounds
Results:
We then chose xgbm for our final model,
Highlights:
Positions Guessed Correctly : 5
Mean Position Difference : 2.1
Mean Points Difference : 6.9 (lowest)
Teams that scored points within the Prediction Interval : 5

Expected Goals Prediction

We used a model to account for expected goals, it did not involve any machine
learning but accounted for it using the Law of Large numbers.
For this model, we did not use all the stats we took from data scraping this model
only has expected goals and shots taken at home and away. It is a model which
shows how teams would perform if the expected goals (xG) metric was the only
metric available for analysis.
Results
This is the resulting table for the 2023 season using Expected Goals, the points
difference (+/-) and the Position changes are in relation to the actual 2023 points
table as it happened.
Highlights:
Positions Guessed Correctly : 2
Mean Position Difference : 2.5
Mean Points Difference : 12.05
Conclusion
After looking at the results of the 4 models, we can safely make the following
conclusions.
● Predicting points can never be too accurate from stats as soccer is itself a
game where each team influences the other’s stats. We would have more
success measuring the points in relation to each other (i.e, Standings)
● Expected Goals is not a reliable metric for concluding games and standings. It
only helps and is not the be all end all stat of football as it is claimed to be.
● Linear model is our best model for predicting the standings. That is with
variable selection other than that Random forest was very consistent with the
points prediction.

Executive Summary
This project aims to forecast the standings of soccer teams in a league using various
predictive models and key match statistics. The project explores the application of
different statistical methods and machine learning models to predict the league
position of soccer teams based on performance indicators. The methods compared
include a basic Educated-Guessing approach based on expected goals (xG) and
three machine learning approaches: Linear Regression, Random Forest, and
Extreme Gradient Boosting (XGBoost). The effectiveness of these models is
evaluated to determine which is most reliable for predicting soccer league
standings.
Introduction
Predictive modeling in sports analytics focuses on using statistical data to forecast
outcomes in sports events. This project leverages data from the Premier League to
predict team standings at the end of the season using different statistical metrics
related to team performance, including goalkeeping, defense, and attacking
statistics.

Project Objectives
To utilize key soccer match statistics to predict league standings.
To compare different predictive models in terms of accuracy and reliability.
To enhance understanding and insights into soccer analytics through effective data
utilization.
Data was automatically gathered via an R script, pulling information from a
dedicated sports statistics website. The dataset consists of team performance
metrics from the 2019, 2022, and 2023 seasons, with the first two seasons used for
training the models and the latest season for prediction. The collected data focus on
three main aspects of the game: goalkeeping, defending, and attacking, totaling 27
specific features.

Methodology
Data Preparation
Data from over 100 potential statistics were narrowed down to 27 relevant
indicators, ensuring the removal of correlated or redundant metrics. The selected
variables cover a broad spectrum of team performance, including specific metrics
like Save Percentage, Clean Sheet Percentage, and Progressive Passes.

Predictive Modeling
Four primary predictive approaches were employed:

Educated-Guessing Using Expected Goals (xG): This non-machine learning approach

uses xG to simulate matches and predict outcomes based solely on goal
probabilities.
Linear Regression: This model uses team statistics to predict the number of points a
team will earn, focusing on minimizing the root mean squared error (RMSE).
Random Forest: This ensemble model uses 500 decision trees to predict points,
considering 8 variables at each split to capture complex patterns in the data.
XGBoost: An implementation of gradient boosted trees designed for speed and
performance, this model also aims to predict team points accurately.

Evaluation Metrics
Models were evaluated based on several metrics:

Positions Guessed Correctly

Mean Position Difference
Mean Points Difference
Points Predicted Within the Upper and Lower Bounds
Results
Model Comparisons
Expected Goals Prediction: This model performed the weakest, with the highest
mean points difference and only 2 positions guessed correctly.
Linear Regression: Showed strong performance in predicting the standings with the
lowest mean position difference and a good balance between accuracy and
reliability in point predictions.
Random Forest: Excelled in the accuracy of point predictions, with 19 out of 20
teams' points falling within the predicted intervals.
XGBoost: Similar to Random Forest in point prediction accuracy but slightly better in
predicting correct positions.
Key Findings
The effectiveness of predictive models varies, with machine learning approaches
generally outperforming simpler statistical methods.
No single model perfectly predicts outcomes due to the unpredictable nature of
soccer.
Expected goals, while a popular metric, may not be sufficient alone to predict
overall team performance accurately.
Conclusion
The study concludes that while predictive modeling can provide valuable insights
into soccer analytics, the inherent unpredictability of sports means that predictions
will always have limitations. The Linear Regression model, despite its simplicity,
proved effective for this specific application, balancing between the granularity of
data and prediction accuracy. However, for more consistent point prediction,
ensemble methods like Random Forest and XGBoost are recommended.
Final Project Report: Predictive Analysis of Soccer League Standings Based on Key
Match Statistics

Yv YFgwj 7 ZC 6 H XYOAio 3 Eb ISUZs X1 U 2 Vec 7 SC 51 D O
No ratings yet
Yv YFgwj 7 ZC 6 H XYOAio 3 Eb ISUZs X1 U 2 Vec 7 SC 51 D O
5 pages
Predicting Outcome of Soccer Matches Using Machine Learning
No ratings yet
Predicting Outcome of Soccer Matches Using Machine Learning
12 pages
Prediction of English Premier League Soccer Matches
No ratings yet
Prediction of English Premier League Soccer Matches
60 pages
Results of Sports Matches For 2025
No ratings yet
Results of Sports Matches For 2025
8 pages
Sports Analytics For Football League Table and Player Performance Prediction
No ratings yet
Sports Analytics For Football League Table and Player Performance Prediction
8 pages
A Comparative Study of The Different Classification Algorithms On Football Analytics
No ratings yet
A Comparative Study of The Different Classification Algorithms On Football Analytics
16 pages
Football Match Prediction System
No ratings yet
Football Match Prediction System
7 pages
Machine Learning For Soccer Match Result Prediction: Rory - [email protected] - Nagoya-U.ac - JP
No ratings yet
Machine Learning For Soccer Match Result Prediction: Rory - [email protected] - Nagoya-U.ac - JP
41 pages
I Want To Use My Program To Predict Football Match
No ratings yet
I Want To Use My Program To Predict Football Match
6 pages
EPL Match Outcome Prediction Using ML
No ratings yet
EPL Match Outcome Prediction Using ML
6 pages
ICS5200 Matthew Zammit Soft
No ratings yet
ICS5200 Matthew Zammit Soft
128 pages
Wilkinson Draft 2
No ratings yet
Wilkinson Draft 2
3 pages
Sports Match Prediction Using AI
No ratings yet
Sports Match Prediction Using AI
2 pages
Analysis and Prediction of Soccer Games - An Application To The Kaggle European Soccer Database
No ratings yet
Analysis and Prediction of Soccer Games - An Application To The Kaggle European Soccer Database
6 pages
Comparison of Football Results Using Machine Learning Algorithms
No ratings yet
Comparison of Football Results Using Machine Learning Algorithms
7 pages
Forty Years of Soccer Match Outcome Modeling An Experimental Review1
No ratings yet
Forty Years of Soccer Match Outcome Modeling An Experimental Review1
20 pages
Expected Goals in Soccer
No ratings yet
Expected Goals in Soccer
63 pages
Sjoberg Fredrik
No ratings yet
Sjoberg Fredrik
75 pages
EPL Prediction Web App
No ratings yet
EPL Prediction Web App
15 pages
Game ON! Predicting English Premier League Match Outcomes
No ratings yet
Game ON! Predicting English Premier League Match Outcomes
5 pages
Thesis Proposal Presentation
No ratings yet
Thesis Proposal Presentation
15 pages
Machine Learning For Football Matches and Tournaments
No ratings yet
Machine Learning For Football Matches and Tournaments
8 pages
Predicting Baseball Wins Using Machine Learning
No ratings yet
Predicting Baseball Wins Using Machine Learning
3 pages
1 s2.0 S016920702300033X Main
No ratings yet
1 s2.0 S016920702300033X Main
11 pages
Football Prediction with ML
No ratings yet
Football Prediction with ML
73 pages
Predicting Football Match Outcomes with ML
100% (1)
Predicting Football Match Outcomes with ML
5 pages
Fusion Models for Football Match Prediction
No ratings yet
Fusion Models for Football Match Prediction
6 pages
Entropy 23 00090 v3
No ratings yet
Entropy 23 00090 v3
12 pages
Sminton,+13509 Article+ (PDF) 30287 1 11 20220414
No ratings yet
Sminton,+13509 Article+ (PDF) 30287 1 11 20220414
38 pages
Advanced Business Analytics
No ratings yet
Advanced Business Analytics
4 pages
Group 01 - A
No ratings yet
Group 01 - A
12 pages
European Football Match Result Prediction
No ratings yet
European Football Match Result Prediction
5 pages
EPL Player Performance Forecasting
No ratings yet
EPL Player Performance Forecasting
13 pages
Abstract A New XG Model For Football Analytics.
No ratings yet
Abstract A New XG Model For Football Analytics.
3 pages
Predicting The Outcome of Soccer Matches
100% (1)
Predicting The Outcome of Soccer Matches
97 pages
Predicting Football Results with Algorithms
No ratings yet
Predicting Football Results with Algorithms
8 pages
Predicting Final Football League Tables
No ratings yet
Predicting Final Football League Tables
6 pages
Predicting The Outcome of A Football Game: A Comparative Analysis of Single and Ensemble Analytics Methods
No ratings yet
Predicting The Outcome of A Football Game: A Comparative Analysis of Single and Ensemble Analytics Methods
9 pages
A Comparative Study of Data Mining Techniques On Football Match Prediction
No ratings yet
A Comparative Study of Data Mining Techniques On Football Match Prediction
8 pages
Football Match Prediction Guide
No ratings yet
Football Match Prediction Guide
17 pages
Deep Learning Predicts EPL Results
No ratings yet
Deep Learning Predicts EPL Results
6 pages
The Betting Odds Rating System Using Soccer
No ratings yet
The Betting Odds Rating System Using Soccer
18 pages
Bayesian Model for Spanish Football Predictions
No ratings yet
Bayesian Model for Spanish Football Predictions
15 pages
EPL Match Result Prediction Models
No ratings yet
EPL Match Result Prediction Models
5 pages
EPL Match Prediction Using AI
100% (1)
EPL Match Prediction Using AI
5 pages
IJCRT2304812
No ratings yet
IJCRT2304812
8 pages
Verhoosel 33241900 2024-2
No ratings yet
Verhoosel 33241900 2024-2
82 pages
bMATH 2020 BruinsmaR
No ratings yet
bMATH 2020 BruinsmaR
43 pages
Allahyyy
No ratings yet
Allahyyy
54 pages
BA - Group 8 - Final
No ratings yet
BA - Group 8 - Final
12 pages
Real Madrid Vs Arsenal Model Thesis
No ratings yet
Real Madrid Vs Arsenal Model Thesis
13 pages
ML in Soccer Analytics Gunjan Kumar
No ratings yet
ML in Soccer Analytics Gunjan Kumar
99 pages
Beating The Odds: Learning To Bet On Soccer Matches Using Historical Data
No ratings yet
Beating The Odds: Learning To Bet On Soccer Matches Using Historical Data
7 pages
Prediction of Football Match Score and Decision Making Process
No ratings yet
Prediction of Football Match Score and Decision Making Process
4 pages
Forecasting Football Matches by Predicting Match S
No ratings yet
Forecasting Football Matches by Predicting Match S
21 pages
The Application of Machine Learning For Sport Result Prediction A Review
No ratings yet
The Application of Machine Learning For Sport Result Prediction A Review
49 pages
Predicting Football Match Outcomes
No ratings yet
Predicting Football Match Outcomes
21 pages
Predicting Football Match Outcomes With Machine Le
No ratings yet
Predicting Football Match Outcomes With Machine Le
8 pages
Financial Performance of SBI vs ICICI
No ratings yet
Financial Performance of SBI vs ICICI
12 pages
Learning Disabilities in Children
60% (5)
Learning Disabilities in Children
9 pages
Understanding Tax Evasion in Ethiopia
No ratings yet
Understanding Tax Evasion in Ethiopia
12 pages
KET (A2) : Reading and Writing Part 1 Questions 1-6
No ratings yet
KET (A2) : Reading and Writing Part 1 Questions 1-6
6 pages
AAR54 MWSyatem
100% (1)
AAR54 MWSyatem
2 pages
MSC Botony Sem 1 2021
No ratings yet
MSC Botony Sem 1 2021
25 pages
DAPAN
No ratings yet
DAPAN
7 pages
Solid Waste Management System
No ratings yet
Solid Waste Management System
45 pages
Nerval's Thesis Illuminés, Eccentricity, and The Evolution of Madness
No ratings yet
Nerval's Thesis Illuminés, Eccentricity, and The Evolution of Madness
303 pages
GenChem1 Lesson 2
No ratings yet
GenChem1 Lesson 2
48 pages
Aluminium Magnesium AlMg
No ratings yet
Aluminium Magnesium AlMg
2 pages
ISO 10375: Ultrasonic Testing Standards
No ratings yet
ISO 10375: Ultrasonic Testing Standards
24 pages
Kurosawa Kiyoshi: A Filmmaker's Journey
No ratings yet
Kurosawa Kiyoshi: A Filmmaker's Journey
6 pages
Bad Weather Ship Maneuvers Guide
No ratings yet
Bad Weather Ship Maneuvers Guide
6 pages
Omron Programming Manual
No ratings yet
Omron Programming Manual
1,175 pages
Overview of the Respiratory System
No ratings yet
Overview of the Respiratory System
1 page
EDC-61 Rig Survey
100% (2)
EDC-61 Rig Survey
14 pages
Intravenous Infusion Stability Guide
No ratings yet
Intravenous Infusion Stability Guide
1 page
Engaging Ideas: A Professor's Guide
0% (1)
Engaging Ideas: A Professor's Guide
3 pages
Beyondheroesunlimiteduniverse15 The Bestiary 1
No ratings yet
Beyondheroesunlimiteduniverse15 The Bestiary 1
288 pages
DLL Week 2
No ratings yet
DLL Week 2
6 pages
The Impossible Autobiography Deep Images PDF
No ratings yet
The Impossible Autobiography Deep Images PDF
182 pages
Abs 300 - Descriptive Statistics
No ratings yet
Abs 300 - Descriptive Statistics
4 pages
Weed Science Mcqs Test (By Shubham Birla)
No ratings yet
Weed Science Mcqs Test (By Shubham Birla)
3 pages
Latika Thapliyal
No ratings yet
Latika Thapliyal
5 pages
Pathogenesis of Odontogenic Cysts: Article
No ratings yet
Pathogenesis of Odontogenic Cysts: Article
5 pages
5387843-Quantum Tantra - DMT-Extraction PDF
No ratings yet
5387843-Quantum Tantra - DMT-Extraction PDF
30 pages
Coordination and Response: IGCSE Biology Workbook
No ratings yet
Coordination and Response: IGCSE Biology Workbook
10 pages
Importance of Lesson Planning
No ratings yet
Importance of Lesson Planning
20 pages
Anabaptist True-Confession-1596
No ratings yet
Anabaptist True-Confession-1596
10 pages

Report 730

Uploaded by

Report 730

Uploaded by

We created an R script that automatically collects data from a website, focusing on

Prediction by Linear Regression

Prediction by Random Forest

Prediction by xG Boosting method

Expected Goals Prediction

Educated-Guessing Using Expected Goals (xG): This non-machine learning approach

Positions Guessed Correctly

You might also like