0% found this document useful (0 votes)
30 views22 pages

Final Report

Uploaded by

Atif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views22 pages

Final Report

Uploaded by

Atif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Assignment Cover Sheet

Subject Code: BALA201

Subject Name: Intoduction to Business Analytics

Submission Type: Research report

Assignment Title: Driving Sustainability: Unraveling the Relationship Between Vehicle Features and Carbon Footprint

Student Name: Mohamed Aathif Mohamed Musthafa

Student Number: 7739047

Student Phone/Mobile No. 0551575839

Student E-mail: mamm350@[Link]

Lecturer Name: Dr, Yiyang Bian

Due Date: 25/03024

Date Submitted: 22/03/24

PLAGIARISM: DECLARATION:
The penalty for deliberate plagiarism is FAILURE in the I/We certify that this is entirely my/our own work, except
subject. Plagiarism is cheating by using the written ideas or where I/we have given fully-documented references to the
submitted work of someone else. UOWD has a strong policy work of others, and that the material contained in this
against plagiarism. document has not previously been submitted for assessment
The University of Wollongong in Dubai also endorses a policy in any formal course of study. I/we understand the definition
of non-discriminatory language practice and presentation. and consequences of plagiarism.

PLEASE NOTE: STUDENTS MUST RETAIN A COPY OF Signature of Student: Aatif


ANY WORK SUBMITTED

Optional Marks:
Comments:

Lecturer Assignment Receipt (To be filled in by student and retained by Lecturer upon return of assignment)
Subject: Assignment Title:
Student Name: Student Number:
Due Date: Date Submitted:
Signature of Student:

Student Assignment Receipt (To be filled in and retained by Student upon submission of assignment)
Subject: Assignment Title:
Student Name: Student Number:
Due Date: Date Submitted:
Signature of Lecturer
Driving Sustainability: Unraveling the Relationship
Between Vehicle Features and Carbon Footprint
Abstract

This study delves into the nuanced relationship between vehicle characteristics and
carbon emissions in the automotive industry of the last decade, presenting crucial
insights for manufacturers, consumers, and policymakers. Employing
methodologies like random forest regression (RFR) and Multi-factor linear
regression (MLR), the research identifies strategic opportunities for enhancing
sustainability without compromising performance, as exemplified by models like
the Porsche Cayenne GTS. Informed consumer choices, guided by accessible
metrics, act as a catalyst for a more environmentally conscious and cost-effective
future, resonating notably with the middle-class segment. The Random Forest
achieved a robust R-squared of 0.97 and a relatively low RMSE of 10.29, surpassing
the performance of multiple linear regression (MLR) method, which lags behind
with an R-squared of 0.89 and RMSE of 19.2. The industry's carbon emissions
average at 250.58 g/km, while recent policies in the European Union has brought
the score below 100g/Km in countries like Sweden, Finland and Denmark. As the
automotive sector progresses towards sustainability, collaborative efforts among
manufacturers, consumers, and policymakers become paramount, emphasizing
their shared responsibility in shaping a greener and environmentally conscious
automotive landscape.

Keywords: CO2 Emissions, Random Forest Regression, Multi-factor Linear


Regression, Fuel consumption, sustainability, Automotive industry

2
Table of contents

1. Introduction.........................................................................................................................2
1.1 Motivation................................................................................................................................3

2. Literature review................................................................................................................ 3

3. Research question...............................................................................................................5
3.1 Proposed hypothesis..............................................................................................................5

4. Methodology........................................................................................................................ 5
4.1 Review method....................................................................................................................... 6
4.2 Data.......................................................................................................................................... 6
4.3 Model specifications...............................................................................................................7
4.31 Linear regression model............................................................................................... 7
4.32 Random forest regression model................................................................................ 7

5. Data analysis....................................................................................................................... 9
5.1 Descriptive statistics...............................................................................................................9
5.2 Exploratory analysis............................................................................................................. 10

6. Statistical results.............................................................................................................. 13
6.1 Regression model result......................................................................................................13
6.2 Random forest regression result....................................................................................... 15
6.3 Comparison........................................................................................................................... 18

7. Business insights............................................................................................................... 19

8. References......................................................................................................................... 20

3
1. Introduction

The ever-expanding automotive industry has brought unprecedented mobility and


convenience to our lives. However, this progress has not been without its ecological
consequences. As the number of vehicles on the road continues to rise globally, the
associated carbon dioxide emissions increase and pose a substantial threat to
environmental sustainability. (Zhang et al.,2023)

This paper delves into the features in a car which impact carbon footprint.

1.1 Motivation
The motive behind this study stems from the pressing need to comprehend and
mitigate the environmental impact of CO2 emissions from vehicles. Understanding
the dynamics of CO2 production from cars is important for devising effective
strategies to combat climate change, as transportation makes up a huge part of
total emissions. This research also aligns closely with the United Nations
Sustainable Development Goals (SDGs), particularly SDG 13: Climate Action, as
highlighted by (Junior et al.,2021) in their study of town cars.

The study also recognizes the profound implications for policy formulation and
implementation. Policymakers and regulators can leverage insights from this
research to shape effective policies aimed at reducing vehicle emissions.

This paper also aims to raise awareness about the factors influencing CO2
emissions and empower consumers to make informed choices, encouraging the
adoption of eco-friendly vehicles and to drive market demand for sustainable
transportation solutions.

In essence, this study uncovers the intricate relationship between vehicular


characteristics and environmental impact using linear regression and random
forest, and analysing the result in order to pave the way for a sustainable future.

4
2. Literature review

The paper by Chapman L, (2007) underscores the significant environmental impact


of cars in 2007. The study acknowledges a prevailing trend in the developing world
where there is a growing desire for car ownership, increasing emissions. The
proposed solution suggests addressing this by promoting greater use of public
transportation, especially trams. This recommendation advocates for a shift to
more sustainable transportation methods, aligning with broader sustainability goals
and offering a practical approach to reducing the environmental impact associated
with the global trend of car ownership.

The impact of engine characteristics on air pollutant emissions from different


vehicle fleets in the capital of Nigeria, is examined in this recent study by Ajayi et al.
(2024). There is an alarming 65% increase in emission levels, with over 65% of the
vehicles inspected being older than ten years. The majority of gasoline and diesel
vehicles have emissions that are over allowed levels, which puts the public's health
at serious risk. Notably, gasoline-powered personal vehicles and minibusses have a
major impact on CO emissions. The study emphasizes how important it is for the
local households to switch to greener forms of transportation.

A similar study by Paravantis and Georgakellos (2007) performs a thorough


investigation of Greece's automobile-related CO2 emissions during a 30-year
period, ending with the end of the past century. The research delves into the
market booming this time frame, which resulted in an increase of automobiles on
the road. For predicting, a regression model - Box-Jenkins method, is utilized. The
main finding indicates that autos are expected to account for almost 85% of CO2
emissions. This study offers insights into the environmental effects of the growing
automobile industry, highlighting the substantial contribution of cars to CO2
emissions, especially in areas with socioeconomic conditions similar to those of
Greece.

5
A comparative analysis of CO2 emissions was conducted in four countries—USA,
India, Bangladesh, and China—utilizing both linear regression and random forest
models. The study finds that, across all countries, the random forest model
consistently outperforms linear regression, as indicated by higher R-squared values.
This underscores the superior predictive capabilities of the random forest model in
capturing the complexities of CO2 emissions in diverse national contexts. The
research contributes valuable insights into the effectiveness of different modeling
techniques, emphasizing the utility of random forest models for more accurate
predictions in the context of global CO2 emissions analysis. (Mitra & Roy, 2017)

3. Research question

The idea this paper delves into can be specified by:

“How can regression models, such as linear regression and random forest
regression, contribute to understanding and predicting the influence of vehicular
features on CO2 emissions?”

The expectation is that implementing regression models, particularly random forest


regression, will reveal intricate relationships between these features and CO2
emissions. We also anticipate that the random forest model will capture non-linear
trends and display a higher predictive power comparatively. This study aims to
assess the effectiveness of regression models in uncovering these relationships. It
also aims to provide insights on how the automotive industry can become greener.

3.1 Proposed hypothesis


● Do vehicular features, such as engine size and number of cylinders, have a
significant impact on carbon emissions in a car?
● Does higher mileage (Miles per Gallon) reduce carbon emissions?

6
4. Methodology

4.1 Review method


The paper employed a meticulous approach to its literature review, utilizing specific
search strategies to identify relevant references. The search was conducted using
the database ‘Google Scholar’. Key search terms such as "CO2 emissions”, "SDG 13”,
“random forest” and "climate change" were employed to target studies pertinent to
the research focus.

The inclusion criteria considered the publication date range, ensuring relevance to
CO2 emissions modeling. Additionally, studies were selected based on their specific
use of regression models, including linear regression and random forest models.

The combination of strategic search terms and precise inclusion criteria allowed the
paper to sift through a vast pool of literature, ensuring that most references
selected were not only recent but also directly applicable to the study's exploration
of CO2 emissions modeling.

4.2 Data
This dataset, sourced from Kaggle's open datasets and derived from the Canadian
Government Open Database, comprises 7385 rows and provides information on
various car attributes. The columns include details about the car's make, model,
vehicle type, engine specifications, fuel type, fuel efficiency (MPG), and CO2
emissions.

The dataset offers a comprehensive view of the automotive landscape, allowing for
an in-depth analysis of the relationship between these features and CO2 emissions.

7
Time Span: 7 years (2013-2019)

Rows: 7385

Source: Kaggle Open Datasets (Government of Canada Open Database)

Columns:

● Make: Manufacturer of the car.


● Model: Specific model of the car.
● Vehicle Type: Type of the vehicle (e.g., sedan, SUV).
● Engine Size: Engine capacity in liters.
● Cylinders: Number of cylinders in the engine.
● Fuel Type:
❖ X: Regular gas
❖ Z: Premium gas
❖ D: Diesel
❖ E: Ethanol
❖ N: Natural gas

MPG (Miles Per Gallon): Fuel efficiency, indicating the distance a car can travel per
unit of fuel.

Emissions: CO2 emissions in grams per kilometer.

4.3 Model specifications


4.31 Linear regression model

CO2 Emissions = β₀ + β₁ * Engine Size + β₂ * Cylinders + β₃ * Fuel Consumption + ε

This is a simple yet powerful model, where β₀ is the constant, while β1, β2, β3 are the
coefficients of independent variables and ε is the residuals.

The model will be evaluated based on R-Squared and RMSE.

8
4.32 Random forest regression model

This is a popular machine learning model, it uses scikit-learn in Python and can be
broken down to:

1. Model Overview:

The algorithm is known as Random Forest Regression Ensemble. An ensemble is a


combination of many decision trees to improve accuracy. The objective is to predict
CO2 Emissions based on features (Engine Size, Cylinders, Fuel Consumption).

2. Feature Randomization:

At each split in a decision tree, Random Forest considers a random subset of


features. This helps in decorrelating the trees and allows the model to generalize
well to non-linear patterns.

3. Training Procedure:

The Random Forest is trained on a bootstrapped sample of the data, creating


diverse subsets for each tree. Trees are grown to the maximum depth specified by
max_depth or until a node contains fewer samples than min_samples_split or
min_samples_leaf.

4. Hyperparameter Tuning:

This is to optimize the model and Grid Search is conducted using GridSearchCV
from scikit-learn.

● n_estimators: Represents the total number of decision trees present in the


ensemble.
● max_depth: Defines the maximum depth that an individual decision tree can
reach. It controls the complexity of each tree.

9
● min_samples_split: Signifies the minimum number of data points required to
split an internal node during the tree-building process.
● min_samples_leaf: Specifies the minimum number of data points that a leaf
node must have. It influences the termination criteria for growing a tree.
● Cross-Validation: Involves employing a cross-validated grid search strategy to
identify the best combination of hyperparameters. This ensures robust
model performance across various subsets of the training data. (Scikit-learn)

5. Model Prediction:

For regression, the Random Forest predicts the average of individual tree
predictions and displays important features affecting CO2 emissions. (Grömping,
2009)

The model is then evaluated using the RMSE( Root mean squared error) and
R-Squared.

5. Data analysis

5.1 Descriptive statistics

Regarding CO2 emissions, the dataset exhibits a mean of 250.58 g/km, with a
median of 246.00 g/km. The distribution is slightly right-skewed, indicating a longer
right tail, and displays leptokurtosis, suggesting heavier tails and a more peaked
distribution. Notably, CO2 emissions range from a minimum of 96.00 g/km to a
maximum of 522.00 g/km, showcasing the span of emissions within the dataset.

10
Turning to the number of cylinders, the dataset's mean and median are 5.62 and
6.00, respectively, indicating a positively skewed distribution and pronounced
leptokurtosis. Here, the variable ranges from a minimum of 3.00 cylinders to a
maximum of 16.00 cylinders, highlighting the diversity in cylinder counts.

Engine size, with a mean of 3.16 L and a median of 3.00 L, displays a slightly
right-skewed distribution. The dataset's leptokurtic shape signifies variability in
engine sizes. Notably, engine sizes span from a minimum of 0.90 L to a maximum of
8.40 L, emphasizing the broad spectrum of engine capacities.

Lastly, fuel consumption averages at 27.48 mpg, with a slightly right-skewed


distribution and leptokurtic shape. The fuel consumption ranges from 11.00 mpg to
69.00 mpg, underscoring the huge range in fuel efficiency.

5.2 Exploratory analysis

Fig 1

11
Figure 1 graphs the average emissions based on vehicle type, it is found that
heavier cars emit more than the lighter ones, for instance, the top 5 include vans,
SUVs and pickup trucks, while smaller cars rank better. This is consistent with the
results found in previou studies. (Zervas & Lazarou, 2008)

Next, the car brands are graphed to understand their fuel cost :

Fig 2

This chart visually depicts the top 15 manufacturers according to mileage, with
prominent Asian (Japanese and Korean) brands in the top. These cars are usually
budget friendly and cater to a large market segment.

Next up, we visualize the brands with most fuel consumption:

12
Fig 3

Figure 3 shows the 5 least performing car brands according to fuel cost. The results
show that supercars and luxury cars perform poorly here, this is a tradeoff between
performance and budget, and this is because these cars are equipped with
powerful engines, consuming more fuel, resulting in lower miles per gallon.

Next, the fuel types are assessed in terms of sustainability:

The below figure shows that Ethanol and premium fuel (E and Z) emit the most
CO2, while diesel (D) and normal fuel (X) emit slightly lower and similar amounts,
while natural gas emits the least. CNG or Compressed natural gas is one of the
cleanest fuels and is widely used in public transport, such as taxis, globally. This can
reduce emissions by more than 22%. (Bi Yu Chen et al., 2024)

13
Fig 4

6. Statistical results

6.1 Regression model result


OLS Estimation (Python)

Co-efficient Std error T- stat P-value

C 319.628 2.084 153.409 0.000

Engine size (L) 7.355 0.468 15.719 0.000

Cylinders 7.614 0.325 23.398 0.000

Fuel cons (MPG) -4.913 0.047 -104.34 0.000

R2 0.89 RMSE 19.2

Table 1

14
1. Constant (Intercept)***: 319.63

It suggests that, with Engine Size, Cylinders, and Fuel Consumption fixed at zero,
the estimated CO2 emissions are 319.63 g/km.

2. Engine Size (L)***:

Coefficient: 7.3535

A one liter increase in Engine Size is associated with an estimated increase of 7.35
g/km in CO2 emissions.

3. Cylinders***:

Coefficient: 7.6144

With each additional cylinder, an estimated increase of 7.61 g/km in CO2 emissions
is observed.

4. Fuel Consumption (mpg)***:

Coefficient : -4.9137

An extra mile per gallon is associated with an estimated decrease of 4.91 g/km in
CO2 emissions. The negative sign indicates an inverse relationship, suggesting that
higher fuel efficiency is associated with lower CO2 emissions.

All variables are extremely significant (at 1%) as the P-values are 0, the R2 of 0.89
explains that 89% of variance in CO2 emissions can be attributed to changes in the
independent variables. The RMSE is 19.2 (CO2 grams/km)

This regression model suggests that Engine Size, Cylinders, and Fuel Consumption
are significant predictors of CO2 emissions, with each variable contributing to the
overall estimate. The negative coefficient for MPG indicates that higher fuel
efficiency is associated with lower CO2 emissions, consistent with expectations.

15
The residual plot shows decent estimation:

Fig 5

6.2 Random forest regression result


The best hyperparameters obtained for the Random Forest model during the
hyperparameter tuning process are as follows:

Max Depth: None

Min Samples Leaf: 1

Min Samples Split: 2

Number of Estimators (Trees): 150

Using this, we obtain the following results:

16
Fig 6

Figure 6 implies:

● The Random Forest model assigns the highest importance to Fuel


Consumption (mpg), with a score of 0.92 emphasizing the critical role of fuel
efficiency in predicting CO2 emissions.
● Engine Size follows as the second most important feature, with a score of
0.069, suggesting that the size of the engine also significantly contributes to
the model's predictive accuracy.
● Cylinders have the least importance among the features considered with a
score of 0.0097, indicating a relatively lower impact on CO2 emissions
predictions compared to the other features.

17
To assess the model, the following is graphed:

Fig 7

The low RMSE (10.29) suggests that the model's predictions are accurate as CO2
emissions range from 96 to 522 g/Km in the dataset.

The high R-squared (0.97)indicates that the model explains a large proportion of the
variability in CO2 emissions, suggesting a robust fit to the data, combining accuracy
and predictivity.

18
6.3 Comparison
Predictive power:

Linear regression performed well based on the feature coefficients, significance


and a good r-squared (0.89), while the random forest regression achieved an higher
and impressive RMSE of 10.29 (vs 19.2). It also had a significantly higher R2 of 0.97,
indicating superior predictive accuracy.

Model Complexity:

Linear regression is a simpler model with explicit coefficients, while random forest
regression is a more complex ensemble model incorporating multiple decision
trees.

Feature Importance:

Linear regression model provided coefficients for each feature, while the other
Identified "Fuel Consumption(mpg)" as the most influential feature.

The Random Forest Regression model outperforms the Linear Regression model in
terms of predictive accuracy, capturing a higher proportion of the variance in CO2
emissions.

Random Forest's ability to handle complex relationships and interactions among


features is reflected in its superior performance, especially in scenarios where
linear relationships may not suffice.

Feature importance analysis in Random Forest emphasizes the significance of "Fuel


Consumption Combined (mpg)" as the primary predictor of CO2 emissions.

In summary, while Linear Regression offers interpretability, Random Forest


Regression excels in predictive power and accuracy, capturing non-linear
relationships, making it a preferable choice for this CO2 emissions prediction task.

19
7. Business insights

Car Manufacturers:

Strategic improvements for car manufacturers involve optimizing fuel economy and
adjusting engine configurations. For instance, models like the Porsche Cayenne GTS
showcase a trend where reducing both engine size and cylinders—combined with
the integration of superchargers/turbochargers and hybrid models—results in
enhanced sustainability and performance indicated by better horsepower (Fung,
2020).

Consumer Choices:

Eco-conscious consumers can now make informed decisions based on critical


metrics like fuel emissions and economy, displayed on stickers in new cars. Beyond
environmental benefits, improved fuel efficiency and electrification of cars attracts
a wider consumer segment, notably the middle class, due to long-term cost savings.

Government Policies:

Global governments can influence automotive sustainability by implementing


policies tied to emission levels and cylinder counts. The European Union
exemplifies effective regulations, with recent cars in countries like Sweden, Finland,
and Denmark (66g/Km, 85g/Km and 86g/Km) having emissions lower than the
minimum in this dataset (96g/Km). Potential regulatory avenues include heavier
import duties and taxes linked to emissions and more cylinders.

In conclusion, the automotive industry can achieve sustainability through strategic


shifts in fuel economy and engine configurations. Informed consumer choices,
guided by accessible metrics, contribute to a greener future. Stringent government
policies, as seen in the EU, play a crucial role in reducing emissions. Collab among
manufacturers, consumers, and policymakers is vital for a sustainable future.

20
8. References
Zhang, P., Zhang, H., Sun, X., Li, P., Zhao, M., Xu, S., Jiao, X., Sun, Z. and Zhang, T., 2023. Research on
carbon emission standards of automobile industry in BRI participating countries. Cleaner and Responsible
Consumption, 8, p.100106.

Junior, L.S., Muraria, T.B. and Guarieiro, L.L.N., 2021. Proposing a Method for Assessing Fuel
Consumption and Pollutants Emissions with the Use of Continuously Variable Transmission in Town Cars.
JOURNAL OF BIOENGINEERING, TECHNOLOGIES AND HEALTH, 4(4), pp.134-140.

Chapman, L., 2007. Transport and climate change: a review. Journal of transport geography, 15(5),
pp.354-367.

Ajayi, S.A., Adams, C.A., Dumedah, G. and Adebanji, A.O., 2024. The Impact of Vehicle Engine
Characteristics on Vehicle Exhaust Emissions for Transport Modes in Lagos City. Urban, Planning and
Transport Research, 12(1), p.2319328.

Paravantis, J.A. and Georgakellos, D.A., 2007. Trends in energy consumption and carbon dioxide
emissions of passenger cars and buses. Technological Forecasting and Social Change, 74(5),
pp.682-707.

Scikit-learn. “[Link] — Scikit-Learn 0.20.3 Documentation.”


[Link], 2018,
[Link]/stable/modules/generated/[Link].

Grömping, U., 2009. Variable importance assessment in regression: linear regression versus random
forest. The American Statistician, 63(4), pp.308-319.

Bi Yu Chen, et al. “Evaluation of Energy-Environmental-Economic Benefits of CNG Taxi Policy Using


Multi-Task Deep-Learning-Based Microscopic Models and Big Trajectory Data.” Travel Behaviour and
Society, vol. 34, 1 Jan. 2024, pp. 100680–100680, [Link] Accessed
20 Mar. 2024.

Mitra, M., & Roy, S. (2017). Comparative Analysis of Predictive Models for Carbon Emission in Major
Countries: A Focus on Linear Regression and Random Forest. International Journal of Science and
Research (IJSR), 6, 2295-2302.

Zervas, E. and Lazarou, C., 2008. Influence of European passenger cars weight to exhaust CO2
emissions. Energy Policy, 36(1), pp.248-257.

Fung, D. (2020) Porsche Cayenne GTS downsizes to twin-turbo V6 from V8, Between the Axles.
Available at: [Link]
(Accessed: 11 March 2024).

Average new car CO2 emissions by country (2023) ACEA. Available at:
[Link]
(Accessed: 11 March 2024).

21
22

You might also like