0% found this document useful (0 votes)

37 views24 pages

MySQL Group Work

The document outlines a final project on SQL databases focused on analyzing Formula 1 data, including circuit characteristics and driver performance. It presents research questions, SQL queries, and potential data analysis methods aimed at understanding competitive trends in Formula 1. The conclusion emphasizes the importance of machine learning for gaining insights and making informed decisions within the sport.

Uploaded by

Bli Wilson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views24 pages

MySQL Group Work

Uploaded by

Bli Wilson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SQL Databases – Final Project (Database Design and Implementation)

SQL Databases (DAMO-500-2)

Team

Mirza Yahya Baig

Sangjin Park

Wilson Kwesi Bli

Instructor: Dr. Hany Osman

6/12/2024

1
Table of Contents

Chapter 1: Background ................................................................................................................ 3

1.1 Background .......................................................................................................................... 3
1.2 Understanding the Database Entities & Relations ........................................................... 3
1.3 Research questions with Hypothesis .................................................................................. 4
Chapter 2: Data Collection and SQL Queries ............................................................................ 5
2.1.1 Research Question One .................................................................................................... 6
2.1.2 Research Question Two .................................................................................................... 7
2.1.3 Research Question Three ................................................................................................. 9
2.2.1 Non-Research Queries ....................................................................................................11
Chapter 3. Potential Data and Machine Learning Analysis ................................................... 22
Chapter 4: Conclusion ................................................................................................................ 24

2
Chapter 1: Background

1.1 Background

Formula 1 is often considered the pinnacle of motorsport: harmonious leading-edge technology

combined with driver skill delivers high-octane racing entertainment. Circuit length, layout, and

design do vary; hence, the performances of drivers and teams tend to vary quite a lot depending

on those characteristics of a circuit. Such understanding of the variations would give insights into

competitive advantages, driver consistency of skill, and team strategy.

The Formula 1 database contains data in a full dataset, including circuit characteristics, race

results, penalties, team standings, and driver performance with lap records. Using the dataset,

this study strives to analyze the interrelation of circuit characteristics and performance metrics

while trying to answer some fundamental questions about competitive trends in Formula 1.

1.2 Understanding the Database Entities & Relations

Below shows the cardinalities between the entities and relations of the database:

3
Figure 1: Formula 1 Database Entity Relationship Diagram

1.3 Research questions with Hypothesis

1. How does the length of a circuit influence lap record times?

Null Hypothesis (H₀): The length of the circuit does not influence lap record time.

Alternate Hypothesis (H₁): The length of the circuit does not influence lap record time.

4
2. Which drivers consistently perform best across circuits of varying lengths and

layouts?

Null Hypothesis (H₀): Driver performance does not significantly vary across circuits of

different lengths and layouts.

Alternate Hypothesis (H₁): Driver performance significantly varies across circuits of

different lengths and layouts.

3. Do specific circuit characteristics favor certain drivers or teams?

Null Hypothesis (H₀): Circuit characteristics do not significantly influence which drivers

or teams achieve lap records.

Alternate Hypothesis (H₁): Circuit characteristics significantly influence which drivers

or teams achieve lap records.

Chapter 2: Data Collection and SQL Queries

To do further analysis of the research questions, the next important step is to query the
database to extract the significant data and queries required. Below is screenshots of database
MySQL queries and output based on the researched questions.

5
2.1.1 Research Question One - How does the length of a circuit influence lap record times?
SQL Query:

Figure 2: Research Question One SQL Query

Output:

6
Figure 3: Research Question One - Output

2.1.2 Research Question Two - Which drivers consistently perform best across circuits of
varying lengths and layouts?

SQL Query:

7
Figure 4: Research Question Two SQL Query

8
Output:

Figure 5: Research Question Two - Output

2.1.3 Research Question Three - Do specific circuit characteristics favor certain drivers or
teams?

SQL Query:

9
Figure 6: Research Question Three Query
Output:

10
Figure 7: Research Question Three - Output

2.2.1 Non-Research Queries

Question 1: Which driver and their team collectively scored the highest total points in races

where the driver achieved a podium finish (Position 1, 2, or 3)?

SQL Code:

11
Figure 8: Non-Research Question One SQL Query

Output:

Figure 9: Non-Research Question One - Output

Question 2: Which team had the best collective driver performance in terms of average race

position across all drivers in the 2020 season?

12
SQL Code:

13
Figure 10: Non-Research Question Two SQL Query

Output:

Figure 11: Non-Research Question Two - Ouput

14
Question 3: How do weather conditions affect the frequency of penalties in races, and which

weather condition is associated with the highest average number of penalties per race?

SQL Code:

Figure 12: Non-Research Question Three SQL Query

15
Output:

Figure 13: Non-Research Question Three - Output

Question 4: How does driver experience (career points and championship) correlate with their

performance on penalty-heavy circuits?

SQL Code:

Figure 14: Non-Research Question Four SQL Query

16
Output:

Figure 15: Non-Research Question Four - Output

Question 5: Which team has the highest average points per race in the 2020 season?

SQL Code:

Figure 16: Non-Research Question Five SQL Query

17
Output:

Figure 17: Non-Research Question Five - Output

18
Question 6: Which country produced the youngest championship winning drivers?

SQL Code:

Figure 18: Non-Research Question Six SQL Query

19
Output:

Figure 19: Non-Research Question Six - Output

Question 7: Which team and driver combination received the most penalties?

SQL Code:

Figure 20: Non-Research Question Seven SQL Query

20
Output:

Figure 21: Non-Research Question Seven - Output

21
Chapter 3. Potential Data and Machine Learning Analysis

The database is crucial to the success of Formula 1. It represents the structure of the league, the

teams, sponsorship and the intensity of the league. Further machine learning analysis can be

performed based on the research question to gain further insight. For example, based on the

hypothesis, and extracted data from research question one, the potential data analysis

methods to be performed are described below:

▪ Descriptive Statistics:

✓ Calculate the mean, median, standard deviation, and range for both circuit length and lap

record times.

✓ Examine the spread and variation of lap record times relative to circuit lengths.

▪ Visualization:

✓ Create a scatter plot with circuit length on the x-axis and lap record times on the y-axis.

This will help visualize the relationship between the two variables.

✓ Add a trend line (linear or nonlinear) to observe the nature of the relationship.

✓ Include boxplots for lap record times grouped by circuit length categories (e.g., short,

medium, long circuits).

▪ Correlation Analysis:

✓ Compute the Pearson or Spearman correlation coefficient to assess the strength and

direction of the relationship between circuit length and lap record times.

▪ Regression Analysis:

22
✓ Perform a linear regression analysis to determine if circuit length significantly predicts

lap record times.

✓ Check the p-value for the slope to test if the relationship is statistically significant.

▪ Hypothesis Testing:

✓ Use ANOVA or t-tests to compare lap record times across predefined circuit length

categories (e.g., short circuits < 4.5 km, medium circuits 4.5–5.5 km, long circuits > 5.5

km).

✓ Test whether the variance in lap record times differs significantly between groups.

▪ Advanced Models:

✓ Incorporate circuit layout and other track-specific variables if available, using multiple

regression or machine learning models to evaluate their collective impact on lap record

times.

23
Chapter 4: Conclusion

Finally, we will conclude about the present futuristic possibility for the project. The performance

to bring into practice a strong database integrated with constructs queries is proved through the

project glare. The formula 1 database provides a strong foundation for trend analysis and pattern

recognition proven from the extracted data set and builds a better understanding of data

characteristics.

It is evident that further improvement, insights, and predictions can be learned from the data

through machine learning analysis, predictively analysis, clustering and segmentation and many

others.

These will be beneficial to the Formula 1 organization, as they can rely on these models and

insights to make informed decisions and further make the league lucrative for investments,

sponsorship and revenue transformations.

Formula 1 Data Analysis Overview
No ratings yet
Formula 1 Data Analysis Overview
12 pages
F1 Team Performance Insights
No ratings yet
F1 Team Performance Insights
15 pages
Tarea Bodegas de Datos Capítulos 4 y 5 PDF
No ratings yet
Tarea Bodegas de Datos Capítulos 4 y 5 PDF
5 pages
SQL Assignment
0% (1)
SQL Assignment
11 pages
Msakamali Baraka
No ratings yet
Msakamali Baraka
33 pages
F1 Driver Training Enhancement
No ratings yet
F1 Driver Training Enhancement
73 pages
F1 Database: Drivers, Constructors, Circuits
No ratings yet
F1 Database: Drivers, Constructors, Circuits
13 pages
Answer Tire Strategy and Car Color On Formula 1 Race Outcomes
No ratings yet
Answer Tire Strategy and Car Color On Formula 1 Race Outcomes
16 pages
Formula For Success - Multilevel Modelling of Formula One Driver and Constructor Performance 1950-2014
No ratings yet
Formula For Success - Multilevel Modelling of Formula One Driver and Constructor Performance 1950-2014
48 pages
F1 Racing Car IP Synopsis Class12
No ratings yet
F1 Racing Car IP Synopsis Class12
3 pages
SQL Assignment for F1 Data Analysis
No ratings yet
SQL Assignment for F1 Data Analysis
5 pages
Formula 1 Driver & Team Performance Analysis
No ratings yet
Formula 1 Driver & Team Performance Analysis
14 pages
F1 2020: Start vs. End Performance
No ratings yet
F1 2020: Start vs. End Performance
28 pages
Tyre Models for Vehicle Handling Analysis
No ratings yet
Tyre Models for Vehicle Handling Analysis
111 pages
Database Systems Assignment Overview
No ratings yet
Database Systems Assignment Overview
7 pages
Math IA - Yasmine
100% (1)
Math IA - Yasmine
13 pages
MSE 601A Statistical Analysis Formula 1
No ratings yet
MSE 601A Statistical Analysis Formula 1
13 pages
F 1
No ratings yet
F 1
18 pages
Math & Cars: Forza Horizon Analysis
No ratings yet
Math & Cars: Forza Horizon Analysis
22 pages
Asg2 2
0% (1)
Asg2 2
6 pages
Grand Oral Math
No ratings yet
Grand Oral Math
4 pages
Magic Formula Tyre Model Algorithm Comparison
No ratings yet
Magic Formula Tyre Model Algorithm Comparison
26 pages
How Data Works in Formula 1 Answers
No ratings yet
How Data Works in Formula 1 Answers
2 pages
10.2 Relational Databases Question Paper
No ratings yet
10.2 Relational Databases Question Paper
61 pages
Isearch 2016
No ratings yet
Isearch 2016
8 pages
BDOI Cars (En)
No ratings yet
BDOI Cars (En)
2 pages
Doe in Racing
No ratings yet
Doe in Racing
21 pages
Abcd
No ratings yet
Abcd
6 pages
f1 Final Presentation
No ratings yet
f1 Final Presentation
20 pages
S We 2009872
No ratings yet
S We 2009872
13 pages
S We 2009872
No ratings yet
S We 2009872
13 pages
AWS Formula 1 Ebook Accelerating The Fan Experience Final
No ratings yet
AWS Formula 1 Ebook Accelerating The Fan Experience Final
18 pages
Predicting Formula 1 Race Outcomes: Decomposing The Roles of Drivers and Constructors Through Linear Modeling
No ratings yet
Predicting Formula 1 Race Outcomes: Decomposing The Roles of Drivers and Constructors Through Linear Modeling
26 pages
A Tool For Lap Time Simulation PDF
No ratings yet
A Tool For Lap Time Simulation PDF
5 pages
A Tool For Lap Time Simulation
No ratings yet
A Tool For Lap Time Simulation
5 pages
Data Science Lab
No ratings yet
Data Science Lab
28 pages
Pilbeam Nowlan
No ratings yet
Pilbeam Nowlan
23 pages
Tire Modeling & Testing Standards
100% (1)
Tire Modeling & Testing Standards
15 pages
Olympic Data Analytics Project
No ratings yet
Olympic Data Analytics Project
51 pages
1996MSEC Atoolforlaptimesimulation
No ratings yet
1996MSEC Atoolforlaptimesimulation
6 pages
F1 Data Analysis for Engineers
No ratings yet
F1 Data Analysis for Engineers
12 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
F1 Description
No ratings yet
F1 Description
2 pages
Sta 108 Group Assignment Ras1136c
No ratings yet
Sta 108 Group Assignment Ras1136c
35 pages
Stat 1
No ratings yet
Stat 1
24 pages
492 3717 1 PB
No ratings yet
492 3717 1 PB
15 pages
Formula 1
No ratings yet
Formula 1
3 pages
The Development of Data Acquisition System of Form
No ratings yet
The Development of Data Acquisition System of Form
18 pages
Marketing Research Project: Pricing Strategy For Formula One India Racing Event' For Retail Customers
No ratings yet
Marketing Research Project: Pricing Strategy For Formula One India Racing Event' For Retail Customers
8 pages
Binary Goal Programming Model For Optimizing Tire Selection Using Branch and Bound Algorithm
No ratings yet
Binary Goal Programming Model For Optimizing Tire Selection Using Branch and Bound Algorithm
12 pages
A Simple Mono-Dimensional Approach For Lap Time Optimisation
No ratings yet
A Simple Mono-Dimensional Approach For Lap Time Optimisation
14 pages
Formula Student Car Lap Time Simulator
No ratings yet
Formula Student Car Lap Time Simulator
75 pages
Tire Data Modeling & Test Report
No ratings yet
Tire Data Modeling & Test Report
68 pages
Prudent Race Engineering OL Brochure - 2nd Edition
No ratings yet
Prudent Race Engineering OL Brochure - 2nd Edition
9 pages
MATHS IA - Analyzing The Impact of Racing Line Optimization On Lap Time Minimization at Autodromo Nazionale Monza in Formula 1 (Final)
100% (1)
MATHS IA - Analyzing The Impact of Racing Line Optimization On Lap Time Minimization at Autodromo Nazionale Monza in Formula 1 (Final)
21 pages
Anova
No ratings yet
Anova
3 pages
Garch Model
No ratings yet
Garch Model
4 pages
Financial Economists' Risk-Return Study
No ratings yet
Financial Economists' Risk-Return Study
50 pages
Proportional Odds in Regression
No ratings yet
Proportional Odds in Regression
16 pages
Multiple Correlation
No ratings yet
Multiple Correlation
5 pages
ARMA Models
No ratings yet
ARMA Models
26 pages
GCSE Statistics: Spearman's Rank Correlation
No ratings yet
GCSE Statistics: Spearman's Rank Correlation
17 pages
Unit III - Small Sample Tests - Notes
No ratings yet
Unit III - Small Sample Tests - Notes
11 pages
ML Basics for MIT Students
No ratings yet
ML Basics for MIT Students
5 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
7 pages
Statistical Inference in Linear Regression
No ratings yet
Statistical Inference in Linear Regression
33 pages
Distribution Tables Normal Studentt
No ratings yet
Distribution Tables Normal Studentt
2 pages
PLUM - Ordinal Regression: Warnings
No ratings yet
PLUM - Ordinal Regression: Warnings
3 pages
(Ebook PDF) Statistical Reasoning in The Behavioral Sciences 7th Edition by Bruce M. King Download
No ratings yet
(Ebook PDF) Statistical Reasoning in The Behavioral Sciences 7th Edition by Bruce M. King Download
47 pages
Subsea Installation Duration Analysis
No ratings yet
Subsea Installation Duration Analysis
6 pages
Modeling Merchandise Returns in Direct Marketing: James D. Hess Glenn E. Mayhew
No ratings yet
Modeling Merchandise Returns in Direct Marketing: James D. Hess Glenn E. Mayhew
16 pages
Ols Regression in Excel
No ratings yet
Ols Regression in Excel
12 pages
Econometrics Sheet 2B MR 2024
No ratings yet
Econometrics Sheet 2B MR 2024
5 pages
Outliers Practice
No ratings yet
Outliers Practice
7 pages
FYBCA - Applied - Mathematics - and - Statistics - Question Bank
No ratings yet
FYBCA - Applied - Mathematics - and - Statistics - Question Bank
7 pages
Empirical Finance PhD Lecture Notes
No ratings yet
Empirical Finance PhD Lecture Notes
94 pages
11014-Article Text-33351-2-10-20230201
No ratings yet
11014-Article Text-33351-2-10-20230201
15 pages
Achievement Motivation
No ratings yet
Achievement Motivation
18 pages
Quantitative Methods for Managers
No ratings yet
Quantitative Methods for Managers
5 pages
Chi-Square Tests for Analysts
No ratings yet
Chi-Square Tests for Analysts
38 pages
UNIT 3 Descriptive Statistics Measures of Central Tendency
No ratings yet
UNIT 3 Descriptive Statistics Measures of Central Tendency
11 pages
M 378K Course Syllabus Fall 2024
No ratings yet
M 378K Course Syllabus Fall 2024
8 pages
Bivariate Regression Analysis Techniques
No ratings yet
Bivariate Regression Analysis Techniques
3 pages
2 Meanand Varianceof Discrete Random Variable
No ratings yet
2 Meanand Varianceof Discrete Random Variable
39 pages
Real Estate Price Prediction Guide
No ratings yet
Real Estate Price Prediction Guide
10 pages

MySQL Group Work

Uploaded by

MySQL Group Work

Uploaded by

SQL Databases – Final Project (Database Design and Implementation)

SQL Databases (DAMO-500-2)

Mirza Yahya Baig

Wilson Kwesi Bli

Instructor: Dr. Hany Osman

Chapter 1: Background ................................................................................................................ 3

Formula 1 is often considered the pinnacle of motorsport: harmonious leading-edge technology

competitive advantages, driver consistency of skill, and team strategy.

1.2 Understanding the Database Entities & Relations

1.3 Research questions with Hypothesis

1. How does the length of a circuit influence lap record times?

different lengths and layouts.

Alternate Hypothesis (H₁): Driver performance significantly varies across circuits of

different lengths and layouts.

3. Do specific circuit characteristics favor certain drivers or teams?

or teams achieve lap records.

Alternate Hypothesis (H₁): Circuit characteristics significantly influence which drivers

or teams achieve lap records.

Chapter 2: Data Collection and SQL Queries

Figure 2: Research Question One SQL Query

Figure 5: Research Question Two - Output

2.2.1 Non-Research Queries

where the driver achieved a podium finish (Position 1, 2, or 3)?

Figure 9: Non-Research Question One - Output

position across all drivers in the 2020 season?

Figure 11: Non-Research Question Two - Ouput

Figure 12: Non-Research Question Three SQL Query

Figure 13: Non-Research Question Three - Output

performance on penalty-heavy circuits?

Figure 14: Non-Research Question Four SQL Query

Figure 15: Non-Research Question Four - Output

Figure 16: Non-Research Question Five SQL Query

Figure 17: Non-Research Question Five - Output

Figure 18: Non-Research Question Six SQL Query

Figure 19: Non-Research Question Six - Output

Figure 20: Non-Research Question Seven SQL Query

Figure 21: Non-Research Question Seven - Output

methods to be performed are described below:

medium, long circuits).

lap record times.

sponsorship and revenue transformations.

You might also like