0% found this document useful (0 votes)
37 views24 pages

MySQL Group Work

The document outlines a final project on SQL databases focused on analyzing Formula 1 data, including circuit characteristics and driver performance. It presents research questions, SQL queries, and potential data analysis methods aimed at understanding competitive trends in Formula 1. The conclusion emphasizes the importance of machine learning for gaining insights and making informed decisions within the sport.

Uploaded by

Bli Wilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views24 pages

MySQL Group Work

The document outlines a final project on SQL databases focused on analyzing Formula 1 data, including circuit characteristics and driver performance. It presents research questions, SQL queries, and potential data analysis methods aimed at understanding competitive trends in Formula 1. The conclusion emphasizes the importance of machine learning for gaining insights and making informed decisions within the sport.

Uploaded by

Bli Wilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SQL Databases – Final Project (Database Design and Implementation)

SQL Databases (DAMO-500-2)

Team

Mirza Yahya Baig

Sangjin Park

Wilson Kwesi Bli

Instructor: Dr. Hany Osman

6/12/2024

1
Table of Contents

Chapter 1: Background ................................................................................................................ 3


1.1 Background .......................................................................................................................... 3
1.2 Understanding the Database Entities & Relations ........................................................... 3
1.3 Research questions with Hypothesis .................................................................................. 4
Chapter 2: Data Collection and SQL Queries ............................................................................ 5
2.1.1 Research Question One .................................................................................................... 6
2.1.2 Research Question Two .................................................................................................... 7
2.1.3 Research Question Three ................................................................................................. 9
2.2.1 Non-Research Queries ....................................................................................................11
Chapter 3. Potential Data and Machine Learning Analysis ................................................... 22
Chapter 4: Conclusion ................................................................................................................ 24

2
Chapter 1: Background

1.1 Background

Formula 1 is often considered the pinnacle of motorsport: harmonious leading-edge technology

combined with driver skill delivers high-octane racing entertainment. Circuit length, layout, and

design do vary; hence, the performances of drivers and teams tend to vary quite a lot depending

on those characteristics of a circuit. Such understanding of the variations would give insights into

competitive advantages, driver consistency of skill, and team strategy.

The Formula 1 database contains data in a full dataset, including circuit characteristics, race

results, penalties, team standings, and driver performance with lap records. Using the dataset,

this study strives to analyze the interrelation of circuit characteristics and performance metrics

while trying to answer some fundamental questions about competitive trends in Formula 1.

1.2 Understanding the Database Entities & Relations

Below shows the cardinalities between the entities and relations of the database:

3
Figure 1: Formula 1 Database Entity Relationship Diagram

1.3 Research questions with Hypothesis

1. How does the length of a circuit influence lap record times?

Null Hypothesis (H₀): The length of the circuit does not influence lap record time.

Alternate Hypothesis (H₁): The length of the circuit does not influence lap record time.

4
2. Which drivers consistently perform best across circuits of varying lengths and

layouts?

Null Hypothesis (H₀): Driver performance does not significantly vary across circuits of

different lengths and layouts.

Alternate Hypothesis (H₁): Driver performance significantly varies across circuits of

different lengths and layouts.

3. Do specific circuit characteristics favor certain drivers or teams?

Null Hypothesis (H₀): Circuit characteristics do not significantly influence which drivers

or teams achieve lap records.

Alternate Hypothesis (H₁): Circuit characteristics significantly influence which drivers

or teams achieve lap records.

Chapter 2: Data Collection and SQL Queries

To do further analysis of the research questions, the next important step is to query the
database to extract the significant data and queries required. Below is screenshots of database
MySQL queries and output based on the researched questions.

5
2.1.1 Research Question One - How does the length of a circuit influence lap record times?
SQL Query:

Figure 2: Research Question One SQL Query

Output:

6
Figure 3: Research Question One - Output

2.1.2 Research Question Two - Which drivers consistently perform best across circuits of
varying lengths and layouts?

SQL Query:

7
Figure 4: Research Question Two SQL Query

8
Output:

Figure 5: Research Question Two - Output

2.1.3 Research Question Three - Do specific circuit characteristics favor certain drivers or
teams?

SQL Query:

9
Figure 6: Research Question Three Query
Output:

10
Figure 7: Research Question Three - Output

2.2.1 Non-Research Queries

Question 1: Which driver and their team collectively scored the highest total points in races

where the driver achieved a podium finish (Position 1, 2, or 3)?

SQL Code:

11
Figure 8: Non-Research Question One SQL Query

Output:

Figure 9: Non-Research Question One - Output

Question 2: Which team had the best collective driver performance in terms of average race

position across all drivers in the 2020 season?

12
SQL Code:

13
Figure 10: Non-Research Question Two SQL Query

Output:

Figure 11: Non-Research Question Two - Ouput

14
Question 3: How do weather conditions affect the frequency of penalties in races, and which

weather condition is associated with the highest average number of penalties per race?

SQL Code:

Figure 12: Non-Research Question Three SQL Query

15
Output:

Figure 13: Non-Research Question Three - Output

Question 4: How does driver experience (career points and championship) correlate with their

performance on penalty-heavy circuits?

SQL Code:

Figure 14: Non-Research Question Four SQL Query

16
Output:

Figure 15: Non-Research Question Four - Output

Question 5: Which team has the highest average points per race in the 2020 season?

SQL Code:

Figure 16: Non-Research Question Five SQL Query

17
Output:

Figure 17: Non-Research Question Five - Output

18
Question 6: Which country produced the youngest championship winning drivers?

SQL Code:

Figure 18: Non-Research Question Six SQL Query

19
Output:

Figure 19: Non-Research Question Six - Output

Question 7: Which team and driver combination received the most penalties?

SQL Code:

Figure 20: Non-Research Question Seven SQL Query

20
Output:

Figure 21: Non-Research Question Seven - Output

21
Chapter 3. Potential Data and Machine Learning Analysis

The database is crucial to the success of Formula 1. It represents the structure of the league, the

teams, sponsorship and the intensity of the league. Further machine learning analysis can be

performed based on the research question to gain further insight. For example, based on the

hypothesis, and extracted data from research question one, the potential data analysis

methods to be performed are described below:

▪ Descriptive Statistics:

✓ Calculate the mean, median, standard deviation, and range for both circuit length and lap

record times.

✓ Examine the spread and variation of lap record times relative to circuit lengths.

▪ Visualization:

✓ Create a scatter plot with circuit length on the x-axis and lap record times on the y-axis.

This will help visualize the relationship between the two variables.

✓ Add a trend line (linear or nonlinear) to observe the nature of the relationship.

✓ Include boxplots for lap record times grouped by circuit length categories (e.g., short,

medium, long circuits).

▪ Correlation Analysis:

✓ Compute the Pearson or Spearman correlation coefficient to assess the strength and

direction of the relationship between circuit length and lap record times.

▪ Regression Analysis:

22
✓ Perform a linear regression analysis to determine if circuit length significantly predicts

lap record times.

✓ Check the p-value for the slope to test if the relationship is statistically significant.

▪ Hypothesis Testing:

✓ Use ANOVA or t-tests to compare lap record times across predefined circuit length

categories (e.g., short circuits < 4.5 km, medium circuits 4.5–5.5 km, long circuits > 5.5

km).

✓ Test whether the variance in lap record times differs significantly between groups.

▪ Advanced Models:

✓ Incorporate circuit layout and other track-specific variables if available, using multiple

regression or machine learning models to evaluate their collective impact on lap record

times.

23
Chapter 4: Conclusion

Finally, we will conclude about the present futuristic possibility for the project. The performance

to bring into practice a strong database integrated with constructs queries is proved through the

project glare. The formula 1 database provides a strong foundation for trend analysis and pattern

recognition proven from the extracted data set and builds a better understanding of data

characteristics.

It is evident that further improvement, insights, and predictions can be learned from the data

through machine learning analysis, predictively analysis, clustering and segmentation and many

others.

These will be beneficial to the Formula 1 organization, as they can rely on these models and

insights to make informed decisions and further make the league lucrative for investments,

sponsorship and revenue transformations.

24

You might also like