0% found this document useful (0 votes)
40 views97 pages

Ajanah, Hakeema Ize Final Project

Final year project

Uploaded by

Owoeye Adenike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views97 pages

Ajanah, Hakeema Ize Final Project

Final year project

Uploaded by

Owoeye Adenike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

COMPARATIVE ANALYSIS OF MACHINE LEARNING

ALGORITHMS FOR CLIMATE CHANGE FORECASTING

BY

AJANAH, HAKEEMA IZE


(18/52HA027)

A PROJECT SUBMITTED TO THE DEPARTMENT OF COMPUTER


SCIENCE, FACULTY OF COMMUNICATION AND INFORMATION
SCIENCES, UNIVERSITY OF ILORIN, ILORIN, NIGERIA.
IN PARTIAL FULFILMENT FOR THE AWARD OF BACHELOR OF
SCIENCE (B.Sc.) IN COMPUTER SCIENCE,
UNIVERSITY OF ILORIN, ILORIN, NIGERIA.

SEPTEMBER, 2023
CERTIFICATION

This is to certify that this study was carried out by AJANAH HAKEEMA IZE

with matriculation number 18/52HA027, in the Department of Computer Science,

Faculty of Communication and Information Sciences, University of Ilorin, Ilorin,

Nigeria.

______________________ ____________________
Dr. I. D. Oladipo Date
(Supervisor)

______________________ ____________________
Prof. R. O. Oladele Date

(Head of Department)

______________________ ____________________
External Examiner Date

II
This project is dedicated to Almighty Allah for His protection and guidance
during my study.

III
ACKNOWLEDGMENTS

I want to thank Almighty Allah for His eternal mercy, guidance and protection

throughout my years at the University of Ilorin. I also want to extend my heartfelt

appreciation to my project supervisor, Dr I. D. Oladipo, for his guidance,

encouragement and meticulous supervision to ensure this project success.

I wish to acknowledge the Head of Department, Prof. R. O. Oladele for his

unwavering support and discipline. A special appreciation goes to my level adviser,

Dr Ghaniyyat B. Balogun, for her motherly love and guidance and discipline, to all

the lecturers in the computer science department, Prof. R. G. Jimoh, Dr D. R.

Aremu, Prof. Oluwakemi C. Abikoye, Dr A. R. Ajiboye, Dr Tinuke O. Oladele, Dr

A. O. Babatunde, Dr Abimbola G. Akintola, Dr Shakirat A. Salihu, Dr K. S.

Adewole, Dr J. B. Awotunde, Dr Modinat A. Mabayoje, Dr Fatima E. Usman-

Hamza, Dr Ayisat W. Asaju-Gbolagbade, Dr A. O. Ameen, Dr M. Abdulraheem,

Mr H.A Mojeed, Mr P. Sadiku, Mr A. O. Balogun, and the entire department of

Computer Science. I wish you all the best in your endeavors.

I would like to acknowledge my parents, Mr & Mrs Ajanah, my siblings, Ajanah

Maimuna Oyiza, Imran Ibrahim, and my niece, Shaaziya Sani-Omolori. May Allah

reward you abundantly. I also extend my heartfelt appreciation to my friends for

always being there for me. I will be forever grateful to all of you.

IV
ABSTRACT

Climate change forecasting is of paramount importance in understanding and


mitigating the potential impacts of global environmental shifts. As climate patterns
become increasingly unpredictable, the application of machine learning algorithms
to climate change prediction has gained significant traction. This aim of this study
is to conduct a comprehensive comparative analysis of various machine learning
algorithms for climate change forecasting, focusing on their performance, accuracy,
and applicability. The algorithms selected for this study are Random Forest,
Decision Tree, CatBoost, XGboost, LightGBM, HistGradient, and Extra Trees.
These algorithms were chosen due to their established effectiveness in handling
complex and multidimensional datasets. By employing a diverse set of algorithms,
this study aims to capture a holistic view of their capabilities and limitations in the
context of climate prediction. The research methodology will involve multiple
stages. Firstly, a thorough exploration of relevant climate datasets from National
Oceanic and Atmospheric Administration (NOAA). Preprocessing techniques will
be applied to handle missing data, outliers, and to normalize the data for optimal
algorithm performance. Subsequently, the selected algorithms will be
implemented, fine-tuned, and trained on the climate datasets. Careful consideration
will be given to hyper parameter tuning to ensure each algorithm's optimal
performance. The relevant metrics used to evaluate the algorithms performance are
Mean Squared Error(MSE), Root Mean Squared Error (RMSE), and R-squared
(R2). The results of this study illuminate the remarkable capabilities of these
algorithms in predicting climate change patterns. With a Root Mean Squared Error
(RMSE) 0.1643, CatBoost emerges as the standout performer, followed by
LightGBM with an RMSE of 0.2385. XGBoost, Random Forest, Extra Trees,
HistGradient Boosting, and Decision Tree follow with RMSE values of 0.9409,
2.0365, 2.22, 2.6027, and 2.6796, respectively. Visualization techniques will be
employed to present the results and provide insights into the strengths and
weaknesses of each algorithm. In conclusion, this project aims to contribute to the
field of climate science by offering an in-depth analysis of machine learning
algorithms for climate change forecasting. By comparing the performance of
various machine learning algorithms, valuable insights can be gained into the
algorithms' suitability for different climate prediction scenarios. The outcomes of
this study have the potential to enhance the accuracy and reliability of climate
change forecasting, thus aiding policymakers, scientists, and environmentalists in
making informed decisions to address the challenges posed by climate change.

Keywords: climate change prediction, machine learning, comparative analysis,


ensemble techniques, climate data, catboost, decision trees, evaluation metrics

V
TABLE OF CONTENTS
TITLE PAGE ........................................................................................................... i

CERTIFICATION .................................................................................................. II

ACKNOWLEDGMENTS .................................................................................... IV

ABSTRACT........................................................................................................... V

TABLE OF CONTENTS ...................................................................................... VI

LIST OF TABLES ................................................................................................. X

LIST OF FIGURES .............................................................................................. XI

CHAPTER ONE ..................................................................................................... 1

INTRODUCTION .................................................................................................. 1

1.1 BACKGROUND TO THE STUDY ..............................................................1

1.2 Statement of Problem .....................................................................................4

1.3 Aim and Objectives........................................................................................5

1.4 Significance of the Study ...............................................................................5

1.5 Scope of the Study .........................................................................................6

1.6 Definition of Terms........................................................................................6

1.7 Organization of the Report.........................................................................7

CHAPTER TWO .................................................................................................... 9

LITERATURE REVIEW ....................................................................................... 9

2.1 Introduction ....................................................................................................9

VI
2.2 Climate Change ..............................................................................................9

2.3 Causes of Climate Change .............................................................................9

2.3.1 Natural Causes of Climate Change ...........................................................10

2.3.2 Anthropogenic Causes of Climate Change ...............................................10

2.4 Impact of Global Climate Change ...............................................................11

2.4.1 Extreme Weather ......................................................................................11

2.4.2 Air Pollution..............................................................................................12

2.4.3 Health Risks ..............................................................................................12

2.4.4 Rising Sea Levels ......................................................................................12

2.4.5 Warmer, More Acidic Oceans ..................................................................13

2.4.6 Threatened Ecosystems .............................................................................13

2.5.1 Linear Regression .....................................................................................14

2.5.2 Decision Trees ..........................................................................................14

2.5.3 Random Forests ........................................................................................14

2.5.4 Support Vector Machines .........................................................................14

2.5.5 Artificial Neural Networks .......................................................................15

2.5.6 Deep Learning ...........................................................................................15

2.5.7 Clustering ..................................................................................................15

2.6 Data Preprocessing and Feature Selection ...................................................15

2.7 Comparative Analysis of Machine Learning Algorithms for Climate


Change Forecasting ............................................................................................16

2.8 Review of Previous Studies .........................................................................16

2.9 Conclusion ...................................................................................................30


VII
CHAPTER THREE .............................................................................................. 31

METHODOLOGY ............................................................................................... 31

3.1 Introduction ..................................................................................................31

3.2 Research Design...........................................................................................31

3.2.1 Proposed Framework ................................................................................32

3.3 Dataset Collection ........................................................................................34

3.4 Data Processing ............................................................................................34

3.4.1 Data Cleaning............................................................................................35

3.5 Feature Extraction ........................................................................................37

3.6 Feature Engineering .....................................................................................38

3.7 Data Segmentation .......................................................................................40

3.8 Classification Algorithms ............................................................................42

3.8.1 Bagging Algorithms:.................................................................................42

3.8.1.1 Extra Trees Algorithm ...........................................................................42

3.8.1.2 Random Forest Algorithm .....................................................................44

3.8.2 Boosting Algorithms: ................................................................................47

3.8.2.1 LightGBM ..............................................................................................47

3.8.2.2 Xgboost ..................................................................................................49

3.8.2.3 CatBoost.................................................................................................51

3.8.3 Forest of Randomized Trees .....................................................................53

3.9 Performance Evaluation Metrics..................................................................60

3.9.1 Root Mean Square Error (RMSE).............................................................60

VIII
3.9.2 Mean Square Error (MSE) ........................................................................60

3.9.3 R-squared(R2) ...........................................................................................61

CHAPTER FOUR ................................................................................................. 62

IMPLEMENTATION, RESULTS AND DISCUSSION ..................................... 62

4.1 Introduction ..............................................................................................62

4.2 Planning Stage .........................................................................................62

4.3 Development Tools ..................................................................................62

4.3.1 Programming Language ............................................................................63

4.3.2 Libraries Used ..........................................................................................63

4.3.3 Integrated Development Environment (IDE)............................................64

4.4 Exploratory Data Analysis of Climate Change Forecasting ........................65

4.5 Results of training and testing process.........................................................68

4.5.1 Bagging Algorithms:.................................................................................69

4.5.2 Boosting Algorithms: ................................................................................70

4.5.3 Forest of Randomized Trees: ................................................................... 72

4.6 Result Analysis ........................................................................................73

4.7 Discussion of Findings .................................................................................75

4.7.1 Boosting Algorithms: ................................................................................75

4.7.1.1 CatBoost:................................................................................................75

4.7.2 Comparison with Other Boosting Algorithms: .........................................76

4.7.3 Comparison with Bagging Algorithms and Randomized Trees: ..............76

4.8 Conclusion ...................................................................................................76

IX
CHAPTER FIVE .................................................................................................. 78

SUMMARY, CONCLUSION AND RECOMMENDATIONS ........................... 78

5.1 Summary ..................................................................................................78

5.2 Limitations ...............................................................................................78

5.3 Recommendations ....................................................................................79

REFERENCES ..................................................................................................... 82

X
LIST OF TABLES

Table 4.1: Comparison of the classes of algorithms ............................................. 73

XI
LIST OF FIGURES

Figure 3.1: Theoretical Framework of the System. .............................................. 33


Figure 3.3: Data Cleaning I .................................................................................. 35
Figure 3.4: Data Cleaning II ................................................................................. 36
Figure 3.5: Data Cleaning III ................................................................................ 37
Figure 3.6: Feature Extraction .............................................................................. 38
Figure 3.7: Feature Engineering ........................................................................... 39
Figure 3.8: Target Variable ................................................................................... 40
Figure 3.9: Data Segmentation I ........................................................................... 41
Figure 3.10: Data Segmentation II ........................................................................ 42
Figure 3.11: Extra Trees Learning Process .......................................................... 43
Figure 3.12: Random Forest Learning Process ..................................................... 45
Figure 3.13: LightGBM Learning Process............................................................ 47
Figure 3.14: Xgboost Learning Process. ............................................................... 49
Figure 3.15: CatBoost Learning Process .............................................................. 52
Figure 3.16: Decision Tree Learning Process ....................................................... 54
Figure 3.17: HistGradient Boosting Learning Process ......................................... 57
Figure 4.1: Google Colaboratory Environment ................................................... 64
Figure 4.2: Time Series Data ................................................................................ 65
Figure 4.3: monthly forecast for Precipitation ...................................................... 66
Figure 4.4: Categorical Columns .......................................................................... 67
Figure 4.6: Longitude............................................................................................ 68
Figure 4.7: Evaluation of the Extra Tree Algorithm ............................................. 69
Figure 4.8: Evaluation of the Random Forest Algorithm ..................................... 69
Figure 4.9: Evaluation of Xgboost Algorithm ...................................................... 70
Figure 4.10: Evaluation of LightGBM Algorithm ................................................ 70
Figure 4.12: Evaluation of Decision Tree Algorithm ........................................... 72

XII
Figure 4.13: Evaluation of the HistGradient Algorithm ....................................... 72

XIII
CHAPTER ONE

INTRODUCTION

1.1 BACKGROUND TO THE STUDY

The impacts of climate change are becoming more evident as time goes on. Storms,
droughts, wildfires, and floods are occurring with greater intensity and frequency.
Humanity's dependence on natural resources and agriculture is changing alongside
the global ecosystems. According to the 2018 intergovernmental climate change
report, if greenhouse gas emissions are not eradicated within the next three decades
(as mentioned by Cianconi et al. in 2020), our planet will suffer catastrophic
consequences. Weather prediction has remained one of the most challenging
scientific and technological issues over the past century. This can be attributed
mainly to two key factors. The first is its wide application across various human
pursuits, and the second is the opportunism fostered by technological progress
directly linked to this specific field of research, including the advancements in
computing and improvements in measurement systems (Garima & Mallick, 2016).

Extreme climate change has lately had an impact on Africa; from October 2019 to
January 2020, East Africa had record-breaking rains. Rainfall influences landslides
and floods throughout the region, resulting in a natural disaster that negatively
affects more than 2.8 million people in Ethiopia, Kenya, Somalia, Uganda,
Tanzania, and Djibouti. These regions, however, experience a deficient rainy
season spanning from March to May, leading to food shortages and famines.
Furthermore, the susceptibility of these areas has heightened due to the influence
of climate change, particularly in North-Eastern Africa, which has sparked
increased interest in climate change research (Caroline et al., 2020). Given that
atmospheric greenhouse gases (GHGs) exert the most significant influence on

1
climate change, the utilization of artificial satellites to monitor anthropogenic GHG
concentrations in space has become imperative. Approximately 40% of annual
human-induced carbon dioxide (CO2) emissions originate from coal-burning power
plants. Additionally, man-made sources contributing to methane (CH4) emissions,
aside from natural sources like termites, inland lakes, and wetlands, encompass coal
mines, oil-gas systems, livestock, wastewater management, rice farming, and
landfills (Gurdeep et al., 2021). Multiple studies have indicated an increase in the
frequency and intensity of extreme precipitation events due to climate change
(Kristie et al., 2021). Understanding these evolving hazards is critical for preparing
for future extreme precipitation and flooding. One anticipated consequence of
global warming is heightened precipitation intensity, driven by increasing
atmospheric moisture (Tabari, 2020).The dynamic alterations resulting from
climate change could also impact the position and velocity of storm tracks, as well
as the occurrence of atmospheric conditions conducive to extreme precipitation
(Ben et al., 2022). Nonetheless, the effects of global warming on regional and local
precipitation extremes remain not fully understood due to the inherent complexities
in simulating precipitation processes within general circulation models (Tabari,
2020).

Making accurate predictions is one of the main challenges faced by meteorologists


globally. Scientists have created a variety of methods to forecast meteorological
characteristics, some of which are more dependable than others. A wide range of
factors that are related in one way or another contribute to climate change.
Extensive data has been amassed from sensor readings over a substantial time
frame, encompassing a multitude of variables. The intricate interplay among these
variables and the influence of each phenomenon, when examined across a vast
dataset, surpasses the capacity of human comprehension, which is where the role
of the machines comes into play. A highly-developed machine-learning (ML)

2
approach, which can be regarded as Artificial Intelligence, can process a massive
amount of data as well as the relationship between variables (Olaiya, 2012). The
goal of artificial intelligence, a sub-field of computer science, is to educate a
computer to accomplish tasks that humans are incapable of performing. Artificial
intelligence frequently entails making decisions under varied conditions. In
machine learning, a sub-field of AI, computers discover associations using massive
training datasets. Due to significant advancements in processor availability, speed,
connection, and data storage costs, artificial intelligence and machine learning are
having an increasing impact on society (Philip, 2020). Climate change forecasting
can be done using either supervised or unsupervised machine learning techniques.
The most prevalent group in the most recent articles in the subject, supervised
learning was discovered to be the most fascinating group of techniques for
atmospheric scientists. If any labeled data is available, it can be used as a training
dataset for creating a function that converts inputs into outputs (Olaiya, 2012).

This function can be applied to various datasets, referred to as the testing one, to
evaluate the model. If the results are satisfactory, it can then be applied to the
classification or regression of any application that requires it. In that category, we
find techniques like Support Vector Machine (SVM), Deep Learning (DL), Deep
Learning (DL), Random Forest (RF), Artificial Neural Networks (ANN), and
Decision Trees (DT). The second category of machine learning is unsupervised
learning, where computers must choose alternative ways to separate or minimize
the dimensions of a given dataset in order to conduct additional analysis because
they lack labeled training data. K-means Clustering (K-means) and Principal
Component Analysis (PCA) are two methodologies that atmospheric scientists
frequently use. Atmospheric scientists and meteorologists can anticipate climate
change thanks to machine learning, particularly the supervised technique (Olaiya,
2012).

3
The objective of this study is to compare several machine learning methods for
forecasting climate change. In addition to leveraging both contemporary and classic
tree-based algorithms as well as the use of meteorological data.

1.2 Statement of Problem

The world is currently witnessing a wide range of extreme weather events,


including hurricanes, exceptionally heavy rain, flooding, and heat waves, wildfires,
and drought. These weather occurrences have an impact on many aspects of
countries all over the world, including agriculture, energy, transportation, low-
resource populations, and disaster planning (Eresanya et al., 2022). Precise
extended-term predictions of temperature and precipitation are vital in helping
individuals anticipate and adjust to these severe weather occurrences. Presently,
physics-based models predominantly govern near-term weather forecasting.
However, these models have constraints when it comes to forecasting beyond a
certain timeframe (Fotios et al., 2022). Data scientists can improve sub-seasonal
forecasts by integrating machine learning with physics-based forecasts, which is
possible because meteorological data is readily available (Louise et al., 2023). The
use of sub-seasonal weather and climate forecasts could help communities and
companies adapt to the issues caused by climate change (Lawal et al., 2021).
Although other investigations used linear algorithms and physics-based data, the
outcomes were not as astounding as expected (Youngjun et al., 2021). Notably,
extensive investigation revealed the limitations of the linear algorithm class for
climate change forecasting due to extended training times and subpar predictive
capabilities. Addressing this research gap, this study acknowledges the
shortcomings of linear algorithms and seeks to identify the optimal tree-based
algorithm for climate change forecasting by conducting a comparative analysis of
various tree-based algorithms. This approach aims to leverage meteorological data

4
rather than physics-based data to determine the most suitable algorithm for accurate
and efficient climate change forecasting.

1.3 Aim and Objectives

This work aims to propose a system for comparative analysis of machine learning
algorithms for Climate Change Forecasting.

The following are the objectives of the study:

i. extract the relevant features from the datasets.


ii. select and implement the algorithms.
iii. evaluate the performance of each of the selected algorithms.
iv. comparison of the algorithms.

1.4 Significance of the Study

Generations both now and in the future are seriously threatened by climate change.
It modifies local and regional precipitation extremes. The effects of flooding and
excessive precipitation on human society are extensive. One of the biggest
problems confronting humanity, according to a panel of ML experts, is climate
change.Climate change has heightened the occurrence, intensity, and
unpredictability of natural calamities. The results of this study, however, will offer
an effective tree-based model for forecasting climate change.

The study adds to literature by introducing an efficient way of forecasting climate


change. Therefore, findings from this study will serve as a source of reference for
future scholars and researchers to gauge their work and also add to the body of
knowledge in climate change forecasting.

5
1.5 Scope of the Study

The scope of this research is to evaluate four algorithms for forecasting climate
change, compare the outcomes of each method, and determine which algorithm
provides the best forecasting accuracy. Bagging, Random Forest, Extra Trees,
Gradient Boosting, Extreme Gradient Boost (XGBoost), LightGBM, CatBoost,
Decision Trees and HistGradient Boosting are the algorithms to be used in this
study.

1.6 Definition of Terms

Climate Change: Climate change refers to the long-term alteration of temperature


and typical weather patterns in a specific location or globally.

Database: A database is a logically organized collection of structured data stored


in a computer system. Database management systems, often abbreviated as
databases, are responsible for managing these data repositories.

Data Mining: Data mining involves predicting outcomes by identifying anomalies,


patterns, and correlations within extensive datasets.

Expert System: An expert system is a computer system that replicates the decision-
making capabilities of a human expert.

Feature Selection: Feature selection is the process of reducing the number of input
variables when constructing a predictive model. This helps lower computational
costs and can enhance model performance.

Gradient Boosting: Gradient boosting is a powerful machine learning algorithm


used for forecasting both continuous and categorical target variables, either as a
regressor or classifier.

6
Machine Learning: Machine learning (ML) is a subset of artificial intelligence
(AI) that enables software applications to enhance prediction accuracy without
explicit programming. ML algorithms use historical data as input to forecast new
output values.

Python Programming Language: Python is a high-level, interpreted


programming language known for its emphasis on code readability, indentation-
based design philosophy, and support for both small and large-scale projects.

Matplotlib: Matplotlib is a Python library for creating graphical representations,


commonly used in conjunction with the numerical mathematics extension Numpy.

Scikit Learn: Scikit-learn (or sklearn) is a free machine learning library for Python,
widely used for various data analysis and modeling tasks.

Data Wrangling: Data wrangling, also referred to as data munging, is the process
of transforming and mapping data from one format to another to make it more
suitable and valuable for downstream applications such as analytics.

1.7 Organization of the Report

Here is a summary of the chapters in this research work:

Chapter 1 - Introduction:

This chapter encompasses the background of the study, problem statement,


objectives, significance, scope, terminology definitions, and report organization.

Chapter 2 - Literature Review:

7
The literature review delves into a comprehensive exploration of related concepts
and previous research relevant to the subject.

Chapter 3 - Methodology:

In this chapter, you will find a detailed description of the study's development
phases and approach.

Chapter 4 - Implementation and Results:

This section presents the findings and analysis of the project's work.

Chapter 5 - Summary, Recommendations, and Conclusion:

Chapter 5 comprises a summary of the project, its conclusions, and the


recommendations derived from the research.

8
CHAPTER TWO

LITERATURE REVIEW

2.1 Introduction

In this chapter, we present an overview of prior research pertaining to the utilization


of machine learning algorithms for forecasting climate change. The chapter
commences with an exploration of the climate change concept and proceeds to
conduct a literature review of the existing body of work regarding the application
of machine learning techniques to predict climate change. This review not only lays
the groundwork for our research but also sheds light on the strengths and limitations
of current methodologies.

2.2 Climate Change

Climate change signifies a prolonged alteration in weather patterns and global


temperatures primarily triggered by human activities, such as the burning of fossil
fuels and deforestation. Its repercussions manifest in various forms, encompassing
rising sea levels, a surge in natural calamities, and shifts in temperature and
precipitation patterns. The demand for precise climate change forecasting models
becomes imperative to gain insights into the potential consequences of climate
change and to guide decision-making processes (Shivanna, 2022).

2.3 Causes of Climate Change

The mechanisms governing the Earth's climate system are fundamentally


uncomplicated. The Earth either cools as it reflects solar energy back into space
(largely through clouds and ice) or warms as it absorbs the sun's energy, with the

9
greenhouse effect occasionally trapping heat. Climate change can be influenced by
a multitude of factors, both natural and anthropogenic.

2.3.1 Natural Causes of Climate Change

Throughout Earth's history, fluctuations in temperature have occurred due to


natural processes. Factors like solar intensity, volcanic eruptions, and variations in
naturally occurring greenhouse gases have all contributed to these climate
variations (NRDC, 2022). However, it is crucial to note that the current pace of
climate warming, particularly since the mid-20th century, far exceeds historical
norms and cannot be attributed solely to natural factors. While these natural
influences persist, they exert minimal impact or evolve too gradually to explain the
rapid warming observed in recent decades, as corroborated by NASA.

2.3.2 Anthropogenic Causes of Climate Change

Human-induced emissions of greenhouse gases (GHGs) constitute the principal


driving force behind the current accelerated climate change on Earth, as outlined
by epa.gov. The Earth's capacity to maintain a habitable temperature largely
depends on these greenhouse gases. Regrettably, the concentration of these gases
in our atmosphere has surged in recent years, with levels of carbon dioxide,
methane, and nitrous oxide unprecedented in the last 800,000 years, per the U.S.
Environmental Protection Agency. Carbon dioxide, the chief contributor to climate
change, has increased by 46% since the pre-industrial era. Human activities,
particularly the combustion of fossil fuels such as coal, oil, and gas for
transportation, heating, and energy generation, constitute the primary source of
these emissions. Deforestation, which releases stored carbon into the atmosphere,
stands as another substantial contributor. Logging, clear-cutting, fires, and various
forms of forest degradation are responsible for the release of an estimated 8.1

10
billion metric tons of carbon dioxide annually, accounting for over 20% of global
CO2 emissions. Additional human activities contributing to air pollution include
the use of fertilizers (a prominent source of nitrous oxide emissions), livestock
raising (with cattle, buffalo, sheep, and goats being notable methane emitters), and
specific industrial processes generating fluorinated gases (Shivanna, 2022).

2.4 Impact of Global Climate Change

The failure to mitigate and adapt to climate change is identified as the most
catastrophic global threat, surpassing even concerns like weapons of mass
destruction and water scarcity, according to the 2021 Global Risks Report by the
World Economic Forum. The repercussions of climate change are far-reaching, as
it disrupts global ecosystems, affecting every facet of our lives, including our
habitats, water sources, and air quality. While climate change affects everyone in
some way, it disproportionately impacts certain groups, such as women, children,
people of color, Indigenous communities, and those with lower socioeconomic
status. Climate change is fundamentally intertwined with human rights.

2.4.1 Extreme Weather

As the Earth's atmosphere warms and holds and releases more water, it leads to wet
regions becoming wetter and dry areas becoming drier. This alteration in weather
patterns results in an increased frequency and severity of natural disasters,
including storms, floods, heatwaves, and droughts. These events can have
devastating and costly consequences, jeopardizing access to clean drinking water,
igniting uncontrollable wildfires, causing property damage, hazardous material
spills, air pollution, and loss of life (NRDC, 2022).

11
2.4.2 Air Pollution

Climate change and air pollution are intricately linked, with each exacerbating the
other. Rising global temperatures lead to increased smog and soot levels,
contributing to air pollution. Additionally, extreme weather events, such as floods,
lead to the circulation of mold and pollen, further polluting the air. These conditions
worsen respiratory health, particularly for the 300 million people worldwide with
asthma, and exacerbate allergies. Severe weather events can contaminate drinking
water and damage essential infrastructure, increasing the risk of population
displacement. Displacement, in turn, poses health risks, including overcrowding,
trauma, water scarcity, and the spread of infectious diseases (NRDC, 2022).

2.4.3 Health Risks

Climate change is projected to cause an additional 250,000 deaths annually between


2030 and 2050, with increased heat stress, heatstroke, cardiovascular disease,
kidney disease, and respiratory health issues due to rising global temperatures. Air
pollution also worsens respiratory health and allergies, impacting a significant
portion of the population. Severe weather events can lead to harm, water
contamination, and infrastructure damage, increasing displacement rates.
Displacement brings its own health risks, including overcrowding, trauma, and
infectious diseases. Warmer temperatures also facilitate the spread of insect-borne
diseases such as dengue fever, West Nile virus, and Lyme disease (NRDC, 2022).

2.4.4 Rising Sea Levels

The Arctic is warming at twice the rate of any other region, resulting in the melting
of ice sheets and causing sea levels to rise. By the end of this century, oceans are
projected to rise by 0.95 to 3.61 feet, posing a significant threat to coastal

12
ecosystems and low-lying areas. Island nations and major cities like New York
City, Miami, Mumbai, and Sydney are particularly vulnerable to rising sea levels
(NRDC, 2022).

2.4.5 Warmer, More Acidic Oceans

Approximately one-quarter to one-third of fossil fuel emissions are absorbed by the


oceans, making them 30% more acidic than pre-industrialization levels. This
acidification poses a serious threat to underwater life, particularly organisms like
coral, oysters, and clams with calcified shells or skeletons. The shellfish industry
and the species that depend on shellfish for food are at risk. Coastal towns relying
on fishing and seafood production face economic devastation. Rising ocean
temperatures alter aquatic species' distribution and populations, leading to coral
bleaching events that can devastate entire reef ecosystems, home to over 25% of
marine life (NRDC, 2022).

2.4.6 Threatened Ecosystems

Climate change forces wildlife to rapidly adapt to changing habitats. Many species
alter their behaviors, migrate to higher elevations, and modify migration routes,
potentially disrupting entire ecosystems and their intricate webs of life. This
disruption has dire consequences, with one-third of all plant and animal species
facing extinction by 2070, according to a 2020 study. Vertebrate species are
declining at an accelerated rate, attributed to climate change, pollution, and
deforestation. Warmer winters and longer summers enable some species, like tree-
killing insects, to thrive, posing a threat to entire forests (NRDC, 2022).

13
2.5 Machine Learning Algorithms for Climate Change Forecasting

Numerous machine learning algorithms have been employed in climate change


forecasting, including regression-based methods like linear regression, decision
trees, random forests, and support vector machines. Other techniques encompass
artificial neural networks, deep learning, and clustering.

2.5.1 Linear Regression

Linear regression, a widely used algorithm, assumes a linear relationship between


input and output variables. It has been applied to forecast temperature, rainfall, and
sea level changes.

2.5.2 Decision Trees

Decision trees make decisions based on input variables using a tree-like structure.
They have been employed to predict precipitation, temperature, and extreme
weather events.

2.5.3 Random Forests

Random forests, an extension of decision trees, combine multiple trees to enhance


prediction accuracy. They have found applications in forecasting temperature,
precipitation, and drought.

2.5.4 Support Vector Machines

Support vector machines (SVMs) seek to find a hyperplane separating input data
into different classes. They have been utilized for predicting precipitation,
temperature, and drought.

14
2.5.5 Artificial Neural Networks

Artificial neural networks (ANNs), inspired by the human brain's structure, have
been used for forecasting temperature, precipitation, and extreme weather events.

2.5.6 Deep Learning

Deep learning, a subset of ANNs, employs multiple layers to extract features from
input data. It has been employed in forecasting temperature, precipitation, and sea
level changes.

2.5.7 Clustering

Clustering, an unsupervised algorithm, groups similar data points. It has been


utilized to identify climate change-vulnerable regions and categorize similar
weather patterns.

2.6 Data Preprocessing and Feature Selection

The effectiveness of machine learning algorithms in climate change forecasting


hinges significantly on the quality and relevance of the data employed. Two crucial
steps to ensure data quality are data preprocessing and feature selection. Numerous
studies have underscored the significance of these processes in optimizing
algorithm performance. For instance, Chakraborty et al. (2020) demonstrated the
heightened performance of machine learning algorithms in predicting Indian
rainfall by applying feature selection through Principal Component Analysis
(PCA). Likewise, Li et al. (2021) achieved enhanced accuracy in forecasting the

15
Pacific Ocean Sea Surface Temperature Anomaly (SSTA) by employing data
preprocessing using Singular Spectrum Analysis (SSA).

2.7 Comparative Analysis of Machine Learning Algorithms for Climate


Change Forecasting

Various machine learning algorithms have been employed in crafting climate


change forecasting models. These encompass neural networks, decision trees,
random forests, support vector machines, Bayesian networks, among others. Each
of these algorithms possesses its own strengths and weaknesses, prompting the need
for a comparative analysis to identify the most suitable algorithm for developing
climate change forecasting models. Yang et al. (2020) conducted a comparative
assessment of Decision Trees, Random Forests, Gradient Boosting Machines
(GBM), and Artificial Neural Networks (ANN) in predicting drought severity in
China. Their findings revealed that GBM outperformed other algorithms in
forecasting drought severity.

2.8 Review of Previous Studies

Numerous studies have delved into the utilization of machine learning algorithms
for climate change prediction. Liu et al. (2020) compared multiple machine learning
algorithms, including random forest, gradient boosting, and deep neural networks,
in predicting precipitation patterns in China, with deep neural networks proving to
be superior in terms of accuracy and robustness.

Ma et al. (2018) employed a machine learning approach to predict global climate


change, utilizing several models such as random forest and support vector
regression, trained on climate data spanning from 1960 to 2015. Data encompassed

16
diverse climate drivers, including greenhouse gas emissions and volcanic activity,
sourced from the National Centers for Environmental Information for climate data
and the Global Carbon Project and Global Volcanism Program for climate driver
data. The study demonstrated that machine learning models surpassed traditional
statistical models in forecasting global temperature and precipitation alterations.
Critical climate drivers like carbon dioxide emissions and volcanic activity were
identified as pivotal in these predictions. The study recommended future research
to delve deeper into detailed climate driver data to enhance model accuracy.
Evaluation metrics included mean squared error, mean absolute error, and
correlation coefficient.

Brouwer et al. (2019) harnessed machine learning to forecast the impact of climate
change on water resources in southern Africa. They trained various models,
including artificial neural networks and support vector regression, on climate data
spanning from 1960 to 2016. These models were employed to foresee future water
availability and demand under diverse climate scenarios. Climate data were
obtained from the Climate Research Unit, and water resource data were sourced
from the Southern African Development Community. Machine learning models
effectively predicted future water availability and demand under varying climate
scenarios while identifying regions susceptible to climate change impacts on water
resources. Prospective research could concentrate on incorporating more detailed
climate data to improve model accuracy. Evaluation metrics encompassed mean
squared error, mean absolute error, and correlation coefficient.

Lassalle et al. (2020) leveraged machine learning to anticipate the effects of climate
change on the proliferation of invasive species. Their approach involved training
multiple models, including decision trees and random forests, using data from
citizen science projects. These models were then utilized to project the future
distribution of invasive species under different climate scenarios. Datasets were

17
drawn from citizen science projects like Naturalist and GBIF, along with climate
data from the World Climate database. The results demonstrated the accurate
prediction of future invasive species distribution under diverse climate scenarios.
Future research avenues could involve incorporating more detailed ecological data
and enhancing model accuracy. Evaluation metrics included the area under the
receiver operating characteristic curve and the kappa coefficient.

Ravi et al. (2020) conducted an extensive literature review on machine learning


techniques in climate change forecasting. They explored various algorithms,
including decision trees, random forests, support vector machines, and artificial
neural networks, and analyzed diverse datasets used in climate change studies. The
review also delved into common evaluation metrics for assessing machine learning
model performance. Datasets encompassed temperature, precipitation, and sea
level rise data. The study highlighted neural networks and support vector machines
as the best-performing algorithms. It suggested future research should focus on
creating more accurate and dependable machine learning models for climate change
forecasting.

Sharma et al. (2019) developed a machine learning model for climate change
forecasting utilizing temperature, precipitation, and CO2 concentration data. Their
approach combined linear regression and artificial neural networks to predict future
climate patterns. Temperature, precipitation, and CO2 concentration data were
sourced from the Goddard Institute for Space Studies. The study achieved an
accuracy rate of over 90% in its predictions and recommended future work to
incorporate more data sources to improve model accuracy.

Li et al. (2021) conducted a comparative study of machine learning algorithms for


climate change prediction, including decision trees, random forests, and artificial
neural networks. They employed a dataset comprising temperature, precipitation,

18
and sea level rise data for model training and testing. Artificial neural networks
emerged as the top-performing algorithm, with a Root Mean Square Error of 0.06
and a Mean Absolute Error of 0.03. Random forests also demonstrated promise,
with a Root Mean Square Error of 0.08 and a Mean Absolute Error of 0.04. The
study encouraged future work to focus on expanding data sources and improving
model accuracy, especially in predicting extreme weather events. Evaluation
metrics encompassed root mean square error and mean absolute error.

Rahman et al. (2018) applied a machine learning approach to predict climate change
based on factors such as greenhouse gas emissions, temperature, and precipitation.
They employed a multi-layer perceptron (MLP) neural network for modeling, using
climate data from the Climate Research Unit (CRU) and greenhouse gas emissions
data from the Carbon Dioxide Information Analysis Center (CDIAC). The MLP
model accurately predicted climate changes with a 92.5% accuracy rate,
emphasizing the significant impact of greenhouse gas emissions. Future research
directions could involve incorporating additional factors like land use change and
deforestation for improved model accuracy. The model's performance was assessed
using accuracy rate.

Ali et al. (2020) utilized machine learning algorithms to predict global temperature
based on historical temperature data. Employing a long short-term memory
(LSTM) neural network, the study achieved an accuracy rate of 96.7% in
forecasting future temperature changes. The LSTM model effectively captured
short-term and long-term temperature trends. Future research could focus on
incorporating additional data sources, such as ocean temperatures and atmospheric
carbon dioxide concentrations, to enhance model accuracy. Evaluation metrics
encompassed Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).

19
Alirezazadeh et al. (2021) explored the applications of machine learning methods
for solar radiation prediction through a systematic review of literature spanning
from 2010 to 2021. They scrutinized research methods, datasets, and performance
metrics used in solar radiation prediction studies. Diverse datasets, including
Global Energy Observatory (GEO) and National Renewable Energy Laboratory
(NREL) datasets, were employed. Support vector machines (SVMs), artificial
neural networks (ANNs), and decision trees (DTs) emerged as popular choices for
solar radiation prediction, reporting high accuracy rates. The study advocated for
future research to focus on enhancing accuracy and robustness in machine learning
models for solar radiation prediction and developing hybrid models that combine
different methods. Evaluation metrics included mean absolute error (MAE), root
mean square error (RMSE), and coefficient of determination (R2).

Abdulrahman et al. (2020) investigated the use of machine learning techniques in


predicting the impact of climate change on agriculture. They conducted a
systematic review of literature spanning from 2010 to 2020, analyzing research
methods, datasets, and performance metrics. Datasets included the Food and
Agriculture Organization (FAO) database and the United Nations Framework
Convention on Climate Change (UNFCCC) database. Support vector machines
(SVMs), artificial neural networks (ANNs), and decision trees (DTs) were widely
adopted for predicting the impact of climate change on agriculture, reporting high
accuracy rates in forecasting crop yields, soil moisture, and other agricultural
variables. Future research could explore the transferability of these models across
different regions and crops, enhance interpretability, and develop hybrid models.
Performance metrics encompassed accuracy, precision, recall, and F1 score.

Rahnavard et al. (2019) provided a review of various machine learning (ML)


techniques applied to climate forecasting. They discussed the suitability of different
ML algorithms for climate forecasting, considering their advantages and

20
disadvantages, without focusing on specific datasets. The paper recommended the
development of hybrid models combining different ML techniques and addressing
missing data challenges in climate datasets. The study did not specify the use of
particular evaluation metrics.

Robinson et al. (2019) harnessed the power of machine learning to forecast the
impact of climate change on our oceans. Their approach involved leveraging an
artificial neural network (ANN) algorithm, which relied on historical data to make
predictions. To ensure the ANN's accuracy, the researchers employed both
historical climate data and information about ocean circulation and chemistry to
train their algorithm. The outcomes of this research showcased the ANN's
remarkable ability to predict alterations in ocean temperatures and chemistry in
response to varying climate scenarios. This study pointed towards the necessity of
developing more sophisticated models capable of considering the intricate
interactions among various climate factors. The research employed a range of
evaluation metrics, including mean absolute error, mean squared error, and the
coefficient of determination.

Zhu et al. (2020), on the other hand, introduced a deep learning approach to identify
and analyze climate change patterns. Employing a convolutional neural network
(CNN) algorithm, the study delved into the analysis of satellite data. These
researchers employed satellite data to train their CNN algorithm, demonstrating the
CNN's precision in detecting and scrutinizing climate changes within satellite data.
The research also emphasized the need for evolving deep learning algorithms that
can effectively handle the high-dimensional data frequently encountered in climate
change research. The assessment in this study involved several evaluation metrics,
including accuracy, precision, recall, and the F1 score.

21
Kumar et al. (2020) embarked on a comprehensive review, scrutinizing 120
research papers devoted to climate change detection and mitigation through the
application of machine learning techniques. To assess climate change, various
datasets were utilized, including temperature records, precipitation data, and carbon
dioxide emissions data. The findings unanimously supported the effectiveness of
machine learning techniques in predicting climate change, detecting anomalies, and
mitigating its impact. The researchers contended that these techniques held
immense potential in providing actionable insights and facilitating informed
decisions for tackling climate change. The study proposed future endeavors
concentrating on the development of more precise prediction models, data quality
enhancement, and the crafting of models capable of managing the uncertainty and
variability intrinsic to climate data. Evaluation was carried out using metrics such
as root mean squared error (RMSE), mean absolute error (MAE), and coefficient
of determination (R2).

Cordano et al. (2021) took an ensemble machine learning approach to predict the
implications of climate change on wildfire occurrences within the Mediterranean
region. Leveraging an assortment of climate variables such as temperature,
humidity, and precipitation, they designed a prediction model. The research
integrated two datasets – one housing climate variables and the other containing
historical wildfire occurrence data. The results showcased the ensemble machine
learning approach's aptitude in accurately forecasting wildfire occurrences and
delivering valuable insights concerning the influence of climate change on such
incidents. The study culminated in the assertion that this prediction model could
enrich decision-making and enhance wildfire management strategies. Cordano and
their team also recommended a deeper exploration of more comprehensive datasets
and improvements in the interpretability and explainability of the prediction model.

22
The evaluation in this study spanned metrics such as accuracy, precision, recall,
and the F1 score.

Purushothaman et al. (2020) sought to provide a comprehensive analysis of the


latest trends in employing deep learning techniques for climate forecasting. Their
approach encompassed an extensive literature survey, assessing a multitude of
research articles and spotlighting the strengths and weaknesses of various deep
learning models used in climate prediction. Diverse datasets came into play during
this review, including the NCEP/NCAR reanalysis data, ERA-Interim, and the
Berkeley Earth Surface Temperature dataset. The consensus was that deep learning
models significantly bolstered climate forecasting accuracy, particularly in the
realms of extreme events and spatial-temporal data analysis. The incorporation of
domain-specific knowledge, ensemble learning, and transfer learning techniques
was recognized as further enhancing deep learning models' performance. In terms
of future directions, the study underscored the need for interpretable deep learning
models capable of providing insights into the underlying mechanisms of climate
change. The prospect of hybrid models, combining deep learning with physical
models for enhanced forecasting precision, was also presented. Interestingly,
specific evaluation metrics were not referenced within this review article.

Zaman et al. (2018) unleashed machine learning models, including random forest,
artificial neural networks, and support vector regression, to predict the rainfall
patterns of the Indian monsoon. This endeavor incorporated an array of predictors,
encompassing sea surface temperatures, sea-level pressures, and wind shear, to
construct these models. Multiple datasets played a role, incorporating the Global
Precipitation Climatology Project dataset, the Climate Prediction Center Merged
Analysis of Precipitation, and the NOAA Optimum Interpolation Sea Surface
Temperature dataset. The crux of the matter was that the random forest model
exhibited superior performance in predicting Indian monsoon rainfall, with the

23
inclusion of remote-sensing data notably enhancing prediction accuracy.
Recommendations revolved around creating even more robust machine learning
models that could seamlessly integrate additional predictors, like land surface data
and atmospheric moisture content. Furthermore, the consideration of deep learning
models was touted as a potential path to further heighten climate prediction
accuracy. The study hinged its evaluation on metrics such as mean absolute error,
root mean square error, and correlation coefficient.

Liu et al. (2019), in their pioneering research, constructed a deep learning model
rooted in the long short-term memory (LSTM) neural network, aimed at predicting
global surface temperatures. Their model harnessed historical climate data,
utilizing a sliding window technique for training. Datasets featured in this endeavor
encompassed the Berkeley Earth Surface Temperature dataset and the Climate
Research Unit Temperature dataset. It transpired that the LSTM model
outperformed traditional statistical models when it came to forecasting global
surface temperatures. It was further discerned that incorporating external factors,
notably El Niño-Southern Oscillation, significantly bolstered predictive accuracy.
This research has paved the way for future exploration into alternative deep
learning models, such as convolutional neural networks and generative adversarial
networks, for more refined climate prediction. Recommendations also urged the
inclusion of additional predictors like greenhouse gas emissions and solar radiation.
The study leaned on evaluation metrics like mean absolute error and root mean
square error to gauge model performance.

Akinfaderin et al. (2019) tapped into the climate data collected from weather
stations dotting the Sahelian region. They executed data preprocessing on this
dataset and subsequently trained a machine learning algorithm for climate pattern
prediction. The algorithm demonstrated exceptional precision in predicting climate
patterns within the Sahelian region, encompassing rainfall patterns, temperature

24
fluctuations, and various other climate variables. This study advocates for further
exploration of machine learning's potential in devising strategies to mitigate and
adapt to climate change. The evaluation in this research relied on mean absolute
error (MAE), mean squared error (MSE), and R-squared(R2)as metrics.

Sahu et al. (2020) embarked on a systematic literature review, sifting through 69


research articles centered on climate prediction via machine learning techniques.
Article selection criteria hinged on their relevance to climate prediction and their
utilization of machine learning methods. This research methodology encompassed
a thorough analysis of the articles to pinpoint the machine learning techniques
applied, the datasets at play, the evaluation metrics employed, and the findings
generated. The chosen articles leaned on datasets spanning climate data, satellite
imagery data, and weather data. The overarching discovery was that machine
learning techniques have proven successful in the realm of climate prediction. The
most commonly embraced methods included neural networks, support vector
machines, decision trees, and regression models. It was further deduced that
prediction model accuracy was inextricably linked to data quality and the choice of
machine learning algorithm. Recommendations revolved around the refinement of
models, employing deep learning techniques, and incorporating additional data
sources like oceanic and socioeconomic data. Evaluation in the selected articles
revolved around metrics such as mean absolute error, mean squared error, root
mean squared error, and correlation coefficient.

Arabnejad et al. (2018) undertook a comparative analysis of machine learning


techniques dedicated to climate prediction, utilizing a dataset encompassing global
climate data spanning from 1950 to 2016. Their research methodology entailed
training and testing various machine learning models on this dataset while
scrutinizing their performance across multiple evaluation metrics. The dataset of
choice encapsulated global climate data within the 1950-2016 timeframe.

25
Conclusions highlighted the support vector regression model as the most adept in
climate prediction, achieving a mean absolute error of 0.23 degrees Celsius. The
research underscored the profound impact of the choice of evaluation metric on
outcomes and underscored the promise of different machine learning models when
measured against different metrics. Future prospects outlined the incorporation of
more data sources, such as oceanic and atmospheric composition data, and the
evolution of more intricate machine learning models, including deep learning
models. The metrics that informed this research encompassed mean absolute error,
mean squared error, root mean squared error, correlation coefficient, and coefficient
of determination.

Jaiswal et al. (2020) conducted an in-depth analysis to scrutinize the effectiveness


of machine learning algorithms in predicting and identifying climate change. They
took a multifaceted approach, employing data preprocessing, feature selection, and
classification techniques, and assessed the performance of six distinct machine
learning algorithms. These included decision trees, k-nearest neighbor, random
forest, support vector machines, gradient boosting, and deep learning. The bedrock
of their dataset was the NOAA Climate Data Record (CDR) dataset, spanning
global surface temperature data from 1880 to the present day. The findings
resoundingly demonstrated the superiority of deep learning models, particularly the
deep learning model boasting a staggering 99.38% accuracy – surpassing the
accuracy of other machine learning models. Future trajectories were envisaged in
the development of advanced deep learning models to further enhance climate
change prediction accuracy.

Grolinger et al. (2018) committed their efforts to assessing machine learning


algorithms' proficiency in forecasting precipitation and temperature alterations
attributable to climate change. This undertaking was underpinned by data
preprocessing, feature selection, and regression techniques, with five distinct

26
machine learning algorithms at the forefront: linear regression, polynomial
regression, support vector regression, random forest, and artificial neural networks.
The dataset in focus was the Climate Research Unit (CRU) dataset, encompassing
global temperature and precipitation data from 1901 to 2016. The research
underscored the artificial neural networks' prowess, particularly in predicting
precipitation and temperature changes stemming from climate change. The
artificial neural network achieved an R-squared value of 0.96 for temperature
prediction and 0.88 for precipitation prediction. In charting the way forward, the
study encouraged the exploration of more sophisticated machine learning models
adept at encapsulating the non-linear relationship between climate variables and
their impact on precipitation and temperature. The chosen evaluation metric was R-
squared.

Zhang et al. (2019) employed advanced machine learning techniques to project the
repercussions of climate change on crop yields. The research entailed the
accumulation of diverse datasets encompassing climate variables, soil attributes,
and crop yields from varying regions. Utilizing this data, multiple machine learning
models were trained to forecast prospective crop yields in different climate
scenarios. Integral to this study were climate statistics from the Community Earth
System Model (CESM), soil specifics from the Soil Grids database, and crop yield
data sourced from the Food and Agriculture Organization (FAO). The findings
illustrated the precision of machine learning models in foreseeing crop yields under
distinct climate circumstances, boasting an average accuracy rate of 89%. The study
elucidated key climate variables, particularly temperature and precipitation, which
wield significant influence on crop yields. Recommendations were directed
towards advancing machine learning models to encompass intricate interactions
between climate variables and crop yields. Mean Absolute Error (MAE) and R-
squared (R2) stood as pivotal evaluation metrics.

27
Kondratyev et al. (2020) navigated the realm of deep learning techniques to model
Earth's climate system, encapsulating atmospheric, oceanic, and land surface
processes. A rich array of deep learning models underwent training on extensive
climate data, prominently featuring the Coupled Model Intercomparison Project
Phase 5 (CMIP5) dataset. This dataset encompassed climate model simulations
from diverse research groups globally. The research outcomes demonstrated the
aptitude of deep learning models in accurately forecasting future climate scenarios,
effectively identifying pivotal factors influencing climate variability and change.
Moreover, these models proved instrumental in gauging climate change impacts
across multifarious sectors like agriculture, water resources, and energy. Future
research recommendations underscored the need for improved scalability and
interpretability of deep learning models in climate modeling, advocating for
integration with diverse model types such as physical and statistical models.
Evaluation metrics featured Mean Squared Error (MSE) and Root Mean Squared
Error (RMSE).

Shen et al. (2018) delved into climate prediction within the domain of China,
employing a machine learning approach. The research leveraged a Support Vector
Regression (SVR) model to envisage temperature and precipitation alterations in
China from 1960 to 2016. The bedrock of this endeavor comprised climate data
sourced from the China Meteorological Data Sharing Service System. Temperature
and precipitation data spanning from 1960 to 2016 were instrumental in the study.
The results underscored the SVR model's efficacy in precisely forecasting
temperature and precipitation changes within China. This model showcased
superior predictive accuracy compared to other machine learning counterparts like
Random Forest and Artificial Neural Network. Recommendations spotlighted the
utilization of machine learning to predict additional facets of climate change like
extreme weather events and sea level rise. Key evaluation metrics encompassed

28
mean absolute error (MAE) and root mean square error (RMSE) to ascertain
prediction accuracy.

Garg et al. (2019) harnessed a machine learning approach to prognosticate


temperature shifts within India. Employing a Random Forest Regression (RFR)
model, temperature changes in India from 1901 to 2016 were envisioned. The
research drew upon climate data sourced from the India Meteorological Department
(IMD), encompassing temperature data spanning from 1901 to 2016. The findings
emphasized the RFR model's competence in accurately predicting temperature
changes within the Indian subcontinent. This model showcased heightened
predictive accuracy compared to other machine learning models such as Support
Vector Regression and Artificial Neural Network. The study proposed future
endeavors focusing on utilizing machine learning to predict diverse facets of
climate change, including rainfall patterns and sea level rise. Crucial evaluation
metrics encompassed mean absolute error (MAE) and root mean square error
(RMSE) to gauge prediction accuracy.

Nkwam et al. (2020) embarked on climate anomaly prediction on a global scale,


employing a machine learning approach. The study leaned on an Artificial Neural
Network (ANN) model to foresee temperature anomalies globally, stretching from
1880 to 2016. Integral to this research was climate data sourced from the National
Oceanic and Atmospheric Administration (NOAA), zeroing in on temperature
anomalies spanning the aforementioned timeframe. The research underscored the
ANN model's precision in accurately predicting global temperature anomalies. This
model showcased heightened predictive accuracy when benchmarked against other
machine learning models such as Random Forest and Support Vector Regression.
Future research avenues were outlined, urging the application of machine learning
in predicting other facets of climate change, such as sea level rise and extreme
weather events. Evaluation of predictive accuracy was grounded in mean absolute

29
error (MAE) and root mean square error (RMSE), while the relationship between
predicted and observed temperature anomalies was assessed using correlation
coefficient (CC) and coefficient of determination (R²).

In summary, these studies collectively manifest the potent role of machine learning
and deep learning in unraveling critical insights into climate change impacts,
paving the way for more informed decisions and strategies to mitigate its effects.

2.9 Conclusion

In summary, prior research has consistently demonstrated the efficacy of


employing machine learning algorithms for climate change forecasting. These
algorithms have consistently outperformed conventional statistical models,
displaying a remarkable capability to provide precise predictions of forthcoming
climate patterns. Given the wealth of available machine learning algorithms,
conducting a comparative analysis among them becomes imperative, aiding in the
identification of the optimal algorithm for crafting accurate climate change
forecasting models. The subsequent chapter will delve into an in-depth examination
of the methodology that will be employed in this study, focusing on evaluating and
contrasting the performance of various machine learning algorithms specifically in
the realm of climate change forecasting.

30
CHAPTER THREE

METHODOLOGY

3.1 Introduction

In this chapter, we outline the methodology adopted for our investigation, which
centers on a comparative analysis of machine learning algorithms concerning
climate change prediction. We commence with an exposition of the dataset
employed in this research. Subsequently, we elucidate the preprocessing measures
executed to rectify and prepare the dataset for rigorous analysis. Finally, we provide
an insight into the machine learning algorithms that have been scrutinized in this
study, along with the performance metrics selected to facilitate a comprehensive
comparison of their effectiveness.

3.2 Research Design

The research design is anchored in the realm of comparative analysis, specifically


targeting machine learning algorithms in the domain of climate change prediction.
To facilitate this inquiry, we have harnessed a dataset encompassing climate-related
information, spanning the past decade and sourced from diverse meteorological
stations worldwide. This dataset encompasses a spectrum of critical attributes,
including temperature, precipitation, wind velocity, and atmospheric pressure.
Our research design unfolds through a series of well-defined steps:
Data Collection: The climate data is collected from various weather stations
around the world and stored in a dataset.
Data Preprocessing: The acquired dataset undergoes rigorous cleaning
procedures, addressing any data inconsistencies and missing values via imputation.
Feature Extraction: We systematically extract pertinent features from the dataset,
selecting those deemed most relevant to the task of climate change prediction.

31
Feature Engineering and Selection: A pivotal phase involves feature engineering
and selection, where we optimize the chosen features to enhance their effectiveness
in climate change prediction.
Algorithm Implementation: We proceed to implement a diverse range of machine
learning algorithms, including Bagging, Random Forest, Extra Trees, Gradient
Boosting, XGBoost, LightGBM, CatBoost, Decision Trees, and Histgradient
boosting.
Evaluation Metrics: To assess the performance of these algorithms thoroughly,
we employ an array of evaluation metrics, encompassing measures such as mean
squared error, root mean squared error, and R-squared.

3.2.1 Proposed Framework

In order to fill the research gaps and provide an effective approach to the study, an
innovative framework has been proposed in Figure 3.1. The proposed framework
provides a clear approach of the analysis in this study. It consists of nine
components as seen in the framework.

32
DATA COLLECTION

DATA PREPROCESSING

FEATURE EXTRACTION

FEATURE ENGINEERING

TRAIN DATA TEST DATA

GBM ALGORITHMS MODEL

EVALUATION AND COMPARISON

Figure 3.1: Theoretical Framework of the System.

33
3.3 Dataset Collection

The dataset used in this study was obtained from the National Oceanic and
Atmospheric Administration (NOAA). The data was collected from a variety of
sources, including ground-based weather stations, satellites, and other remote
sensing technologies. It contains daily temperature, humidity, wind speed,
precipitation and other weather-related variables measurements for various
locations from the most recent decade .The dataset also includes information on the
location's latitude, longitude, and elevation. It has a total of 373,000 records, each

representing a day's worth of weather measurements for a particular location.

Figure 3.2: Train Dataset

3.4 Data Processing

Data preprocessing is a critical step in machine learning, and it involves cleaning


and preparing the data for analysis. In this study, the dataset is cleaned, and missing
values are imputed using various techniques such as mean imputation and
regression imputation. Outliers are also identified and removed from the dataset.

34
3.4.1 Data Cleaning

Raw datasets often contain missing values, outliers, inconsistent formatting, and
other data quality issues. Begin by detecting any instances of missing data and
subsequently determine the most suitable approach for addressing them, be it
through imputation methods or deletion. Address outliers by either removing them
if they are erroneous or handling them separately if they represent valuable
information. Ensure consistent formatting and resolve any inconsistencies in units
of measurement and data representation.

Figure 3.3: Data Cleaning I


Identifying Missing Values: In this project, the first task was to identify variables
with missing values in the NOAA datasets. By examining the datasets, columns or
features containing null or NaN (Not a Number) values were identified.

Evaluation of Missing Data: The next step involved assessing the impact of
missing data on the analysis. This evaluation helped determine whether to drop the
entire row or column containing missing values or apply imputation techniques to
fill in the missing values. For instance, if a specific column had a large proportion

35
of missing values, it might be dropped to preserve data integrity and avoid bias in
subsequent analyses.

Figure 3.4: Data Cleaning II

Dropping Variables with Null Values: Considering the project's scope and the
need for a clean dataset, the decision was made to drop variables or columns that
contained null values. By removing these variables, we ensured that the
comparative analysis of machine learning algorithms would be performed using
complete and reliable data. The dropped variables were documented to maintain
transparency and ensure reproducibility.

36
Figure 3.5: Data Cleaning III

3.5 Feature Extraction

Identify the relevant features (predictors) within the NOAA dataset that are likely
to have a significant impact on climate change that are expected to exert a
substantial influence on climate change forecasting. Feature extraction techniques,
such as principal component analysis (PCA), can reduce dimensionality and
capture the most important information. Conduct feature selection by analyzing the
relationships between features and the target variable. Utilize techniques such as
correlation analysis, recursive feature elimination, or regularization approaches to
pinpoint the most significant features.

Extracting Day, Month, and Year: The datetime feature in the NOAA datasets
provides information about the specific date and time of each data point. However,
for climate change forecasting, it is often more meaningful to extract the day,
month, and year from the datetime feature. This extraction enables the analysis to
focus on long-term trends and patterns rather than specific timestamps. Through

37
appropriate date and time manipulation techniques, the day, month, and year
components were extracted from the datetime feature.

Figure 3.6: Feature Extraction

3.6 Feature Engineering

Feature engineering of the NOAA dataset is a crucial step in preparing the data for
comparative analysis of machine learning algorithms for climate change
forecasting. By leveraging domain knowledge, applying temporal aggregation,
incorporating rolling window statistics, including time-based features, introducing
interaction and polynomial features, and performing dimensionality reduction,
researchers can extract meaningful and informative features. Properly engineered
features facilitate the accurate representation of climate dynamics, enabling
machine learning algorithms to capture and predict climate change patterns
effectively.

Dropping Irrelevant Features: In the process of analyzing the NOAA datasets, it


is essential to identify and drop any irrelevant features that do not contribute

38
significantly to the climate change forecasting task. Irrelevant features can
introduce noise, increase computational complexity, and potentially mislead the
machine learning algorithms. By carefully examining the dataset, an assessment
was made to identify any features that were deemed irrelevant to the comparative
analysis of machine learning algorithms. Once identified, the irrelevant feature(s)
were dropped from the dataset.

Min-Max Scaling: Scaling the data is a common practice in feature engineering


to normalize the numerical features and bring them to a similar scale. Min-max
scaling, which is alternatively referred to as normalization, is a method for
readjusting the feature values to a predefined range, commonly falling within 0 to
1. This process ensures that features with different magnitudes and units are on a
comparable scale, preventing any feature from dominating the others during model
training. Min-max scaling was applied to the numerical features in the NOAA
datasets to normalize the data.

Figure 3.7: Feature Engineering

39
3.7 Data Segmentation

Data segmentation is a fundamental step in machine learning projects as it allows


for the evaluation and comparison of different algorithms on independent sets of
data. In the context of the project topic of comparative analysis of machine learning
algorithms for climate change forecasting using NOAA datasets, data segmentation
involves splitting the dataset into training and testing subsets.

Identifying the Target Variable: In climate change forecasting, the target variable
is typically the variable of interest that we want to predict or forecast. For example,
it could be future temperature or precipitation values. Identifying the target variable
is essential for defining the supervised learning problem and splitting the data
accordingly.

Figure 3.8: Target Variable

40
Splitting the Data into Training and Testing Sets: In order to assess and contrast
various machine learning algorithms effectively, it is vital to possess distinct
datasets for training and testing. The NOAA dataset was divided into two separate
subsets: one for training purposes and the other for testing. The training set is
employed for model training and refinement, whereas the testing set serves the
purpose of assessing the models' performance on data they haven't been exposed to

previously.

Figure 3.9: Data Segmentation I

Randomization and Stratification: To ensure that the training and testing datasets
are representative of the overall data distribution, randomization and stratification
techniques can be applied. Randomization ensures that the data samples are
shuffled randomly before splitting, reducing the potential for any systematic bias.
Stratification, on the other hand, ensures that the distribution of classes within the

41
target variable is preserved in both the training and testing sets, particularly when
dealing with imbalanced datasets

Figure 3.10: Data Segmentation II

3.8 Classification Algorithms

3.8.1 Bagging Algorithms:


3.8.1.1 Extra Trees Algorithm

The Extra Trees algorithm, short for Extremely Randomized Trees, is an ensemble
learning method that builds a forest of randomized decision trees. It is similar to
the Random Forest algorithm but introduces additional randomness by selecting
random thresholds for each feature at every node. This extra randomness further
reduces the variance and can lead to improved performance and faster training
times.

42
Figure 3.11: Extra Trees Learning Process
The algorithm follows these steps:

Initialize an empty ensemble of decision trees.

For each tree in the ensemble:

a. Create a random subset of the training data by sampling with replacement.

b. Randomly select a subset of features.

c. Build a decision tree using the random subset of data and features. At each
internal node, randomly select a feature and threshold.

d. Add the trained decision tree to the ensemble.

Return the ensemble of decision trees.

During the prediction phase, the Extra Trees Regressor aggregates the predictions
from all the decision trees in the ensemble by averaging them. The final prediction
is the average of the predictions made by each individual tree.

Pseudocode for Extra Trees Algorithm:

Input:

- Training data D

- Number of base models N

- Number of random features F

43
Output:

- Ensemble model

Procedure Extra Trees (D, N, F):

Initialize an empty ensemble model

For i = 1 to N:

Randomly select F features from the dataset D

Randomly split the dataset D using the selected features

Train a base model M_i on the split dataset

Add M_i to the ensemble model

Return the ensemble model

Application of the Extra Trees on the Climate Change Dataset

In this study, the Extra Trees algorithm takes the NOAA training dataset, the
number of base models N, and the number of random features F as input. It
initializes an empty ensemble model and then iterates N times. In each iteration, it
randomly selects F features from the dataset. It then randomly splits the dataset
using the selected features and trains a base model on the split dataset. Finally, the
trained base model is added to the ensemble model. The process is repeated N
times. Finally, the ensemble model is returned as the output.

3.8.1.2 Random Forest Algorithm

The Random Forest is an ensemble learning technique that enhances prediction


accuracy and robustness by amalgamating the forecasts of numerous decision trees.
It creates an ensemble of decision trees through bagging, a method that introduces
extra variability by examining just a subset of features during each branching point.
This approach aids in diminishing overfitting while enhancing the model’s ability
to generalize.

44
Figure 3.12: Random Forest Learning Process

The algorithm follows these steps:

Initialize an empty ensemble of decision trees.

For each tree in the ensemble:

a. Create a random subset of the training data by sampling with replacement.

b. Randomly select a subset of features.

c. Build a decision tree using the random subset of data and features.

d. Add the trained decision tree to the ensemble.

Return the ensemble of decision trees.

During the prediction phase, the Random Forest Regressor aggregates the
predictions from all the decision trees in the ensemble by averaging them. The final
prediction is the average of the predictions made by each individual tree.

Pseudocode for Random Forest Regressor Algorithm:

Input:

- Training data D

- Number of base models N

45
- Number of random features F

Output:

- Ensemble model

Procedure Random Forest (D, N, F):

Initialize an empty ensemble model

For i = 1 to N:

Randomly select F features from the dataset D

Randomly sample a bootstrap dataset D_i from D

Train a base model M_i on D_i using the selected features

Add M_i to the ensemble model

Return the ensemble model

Application of Random Forest Algorithm to the Climate Change Dataset

In this study, the Random Forest algorithm takes the NOAA training dataset, the
number of base models N, and the number of random features F as input. It
initializes an empty ensemble model and then iterates N times. In each iteration, it
randomly selects F features from the dataset. It also randomly samples a bootstrap
dataset from the dataset, meaning it creates a dataset by sampling with replacement
from the original dataset. T is trained on the bootstrap dataset using the selected
features. Finally, the trained base model is added to the ensemble model. The
process is repeated N times. Finally, the ensemble model is returned as the output.

46
3.8.2 Boosting Algorithms:

3.8.2.1 LightGBM

LightGBM (Light Gradient Boosting Machine) is a relatively new gradient


boosting algorithm that has gained popularity in recent years due to its high
performance and efficiency. Like XGBoost, LightGBM uses decision trees to
model the relationship between input features and output variables. However, it
differs from XGBoost in several ways, including the way it constructs decision trees
and handles missing values.

Figure 3.13: LightGBM Learning Process

Pseudo code for LightGBM

Input:

- Training data D

- Number of base models N

47
- Learning rate eta

- Maximum tree depth max_depth

Output:

- Ensemble model

Procedure LightGBM(D, N, eta, max_depth):

Initialize the predicted values y_hat as 0 for all training samples in D

Initialize an empty ensemble model

For i = 1 to N:

Compute the negative gradient vector r_i for each training sample in D

Train a base model M_i on the negative gradient vector r_i using the
LightGBM-specific objective and regularization terms

Compute the predicted values for the base model M_i

Update the predicted values y_hat by adding the predictions of M_i scaled by
eta

Add M_i to the ensemble model

Return the ensemble model

Application of LightGBM to Climate Change Dataset

In this study, the LightGBM algorithm takes the NOAA training dataset, the
number of base models N, the learning rate eta, and the maximum tree depth
max_depth as input. It initializes the predicted values y_hat as 0 for all training
samples in the dataset, It also initializes an empty ensemble model. Then, it iterates
N times. In each iteration, it computes the negative gradient vector for each training
sample in the dataset. A base model is trained on the negative gradient vector using
LightGBM-specific objective and regularization terms. The predicted values for the

48
base model are computed, and then the predicted values y_hat are updated by
adding the predictions of the base model scaled by the learning rate eta. The base
model is added to the ensemble model. The process is repeated N times. Finally,
the ensemble model is returned as the output.

3.8.2.2 Xgboost

XGBoost, short for Extreme Gradient Boosting, is a widely used gradient boosting
algorithm known for its effectiveness across various machine learning tasks,
including climate change prediction. XGBoost operates by constructing a sequence
of decision trees, where each tree is specialized in forecasting the remaining errors
from the preceding one. This strategy enables the algorithm to adeptly capture
intricate nonlinear patterns within the dataset.

Figure 3.14: Xgboost Learning Process.

49
Pseudocode for Xgboost Algorithm

Input:

- Training data D

- Number of base models N

- Learning rate eta

- Maximum tree depth max_depth

Output:

- Ensemble model

Procedure XGBoost(D, N, eta, max_depth):

Initialize the predicted values y_hat as 0 for all training samples in D

Initialize an empty ensemble model

For i = 1 to N:

Compute the negative gradient vector r_i for each training sample in D

Train a base model M_i on the negative gradient vector r_i using the XGBoost-
specific objective and regularization terms

Compute the predicted values for the base model M_i

Update the predicted values y_hat by adding the predictions of M_i scaled by
eta

Add M_i to the ensemble model

Return the ensemble model

50
Application of the XGBoost Algorithm to the Climate Change Dataset

In this research, the XGBoost algorithm is applied to the NOAA training dataset,
taking as input parameters the number of base models (N), the learning rate (eta),
and the maximum tree depth (max_depth). Initially, it sets the predicted values
(y_hat) for all training samples in the dataset to 0 and creates an empty ensemble
model. The algorithm then proceeds through N iterations. Within each iteration, it
calculates the negative gradient vector for every training sample in the dataset. A
base model is subsequently trained, utilizing XGBoost-specific objective and
regularization terms, based on this negative gradient vector. Predicted values for
the model are computed, and these predictions are incorporated into y_hat after
being adjusted by the learning rate (eta). The base model is then included in the
ensemble model. This entire process repeats itself N times. Ultimately, the
ensemble model is returned as the end result.

3.8.2.3 CatBoost

CatBoost is another gradient boosting algorithm that has gained popularity in recent
years. It is particularly useful for dealing with categorical features, which are
common in environmental data. CatBoost uses a novel approach to handle
categorical features, which helps to improve its accuracy and performance. Like
XGBoost and LightGBM, CatBoost works by building a series of decision trees.

51
Figure 3.15: CatBoost Learning Process

Pseudocode for Catboost Algorithm


Input:

- Training data D

- Number of base models N

- Learning rate eta

Output:

- Ensemble model

Procedure CatBoost(D, N, eta):

Initialize the predicted values y_hat as 0 for all training samples in D

Initialize an empty ensemble model

For i = 1 to N:

Compute the negative gradient vector r_i for each training sample in D

Train a base model M_i on the negative gradient vector r_i using the CatBoost-
specific objective and regularization terms
52
Compute the predicted values for the base model M_i

Update the predicted values y_hat by adding the predictions of M_i scaled by
eta

Add M_i to the ensemble model

Return the ensemble model

Application of CatBoost Algorithm to the Climate Change Dataset

In this study, the CatBoost algorithm takes the NOAA training dataset, the number
of base models N, and the learning rate eta as input. It initializes the predicted
values y_hat as 0 for all training samples in the dataset. It also initializes an empty
ensemble model. Then, it iterates N times. In each iteration, it computes the
negative gradient vector for each training sample in the dataset. A base model is
trained on the negative gradient vector using CatBoost-specific objective and
regularization terms. The predicted values for are computed, and then the predicted
values y_hat are updated by adding the predictions of the base model scaled by the
learning rate eta. The base model is added to the ensemble model. The process is
repeated N times. Finally, the ensemble model is returned as the output.

3.8.3 Forest of Randomized Trees

3.8.3.1 Decision Tree Algorithm:

The Decision Tree algorithm is a versatile and widely used machine learning
method applicable to both classification and regression tasks. It creates a
hierarchical structure comprised of decision nodes and leaf nodes based on the
training data. Within this framework, each decision node represents an assessment
of a specific feature, while each leaf node signifies a predicted class or value. The

53
decision nodes partition the data based on feature conditions, enabling the tree to
make predictions by traversing from the root to a particular leaf node.

Figure 3.16: Decision Tree Learning Process

The algorithm follows the following steps to build a decision tree:

1. Select the best feature to split the data based on a suitable criterion (e.g., Gini
impurity for classification or mean squared error for regression).

2. Create a decision node based on the selected feature and its threshold.

3. Partition the data into two or more subsets based on the feature test.

4. Recursively repeat steps 1 to 3 for each partitioned subset until a stopping


criterion is met.

5. Create a leaf node for each partitioned subset and assign it the most common
class label for classification or the mean value for regression.

6. Return the decision tree.

Decision trees offer interpretability, as the learned rules can be easily understood
and visualized. They can handle both categorical and numerical features and can
capture complex relationships in the data. However, decision trees can be prone to

54
over-fitting, and techniques such as pruning and regularization are often used to
address this issue.

Pseudocode for Decision Tree Algorithm:

Input:

- Training data D

- Maximum tree depth max_depth

Output:

- Trained decision tree model

Procedure DecisionTree(D, max_depth):

If all samples in D belong to the same class:

Create a leaf node with the class label

Return the leaf node

If max_depth is reached or D is a pure node (contains only samples of a single


class):

Create a leaf node with the majority class label

Return the leaf node

Select the best feature and split point to partition the data D

Create a decision node with the selected feature and split point

Split the data D into subsets D_left and D_right based on the selected feature and
split point

Set the left child of the decision node as DecisionTree(D_left, max_depth - 1)

Set the right child of the decision node as DecisionTree(D_right, max_depth - 1)

Return the decision node

55
Application of the Decision Tree Algorithm to the Climate Change Dataset

In this study, the Decision Tree algorithm takes the NOAA training dataset and the
maximum tree depth max_depth as input. It recursively builds a decision tree by
splitting the data based on the selected features and split points. The algorithm
performs the following steps:

a. If all samples in the dataset belong to the same class, create a leaf node with the
class label and return the leaf node.

b. If the maximum tree depth is reached or the dataset is a pure node (contains
only samples of a single class), create a leaf node with the majority class label and
return the leaf node.

c. Select the best feature and split point to partition the NOAA dataset.

d. Create a decision node with the selected feature and split point.

e. Split the NOAA dataset into subsets D_left and D_right based on the selected
feature and split point.

f. Set the left child of the decision node as DecisionTree(D_left, max_depth – 1).

g. Set the right child of the decision node as DecisionTree(D_right, max_depth –


1).

h. Return the decision node.

3.8.3.2 HistGradient Boosting Algorithm

HistGradient Boosting (HGB) stands as an optimized gradient boosting approach


intended for tasks encompassing both classification and regression. It has been
purposefully designed to effectively manage datasets that are characterized by high
dimensionality and substantial scale. HGB amalgamates the strengths of gradient

56
boosting with methodologies reliant on histograms, resulting in superior speed and
performance. At the core of HGB lies the concept of discretizing the input features
into histograms, which it harnesses to execute swift and efficient gradient
computations. This deployment of histograms serves the dual purpose of
diminishing memory usage and accelerating the training procedure, all the while
preserving a notable level of predictive precision. Moreover, HGB incorporates
strategies like histogram subtraction and multi-leaf splitting, which serve to further
augment its efficiency.

Figure 3.17: HistGradient Boosting Learning Process

The algorithm follows the following steps:

Preprocess the training data by discretizing the input features into histograms.

Initialize the ensemble by setting initial predictions for all samples (e.g., using the
mean target value for regression or log-odds for classification).

For each boosting iteration:

a. Compute the negative gradient (pseudo-residuals) of the loss function with


respect to the current ensemble's predictions.

57
b. Build histograms for each feature based on the gradients and their corresponding
weights.

c. Perform histogram subtraction to compute the gradients and Hessians of the loss
function for candidate splits.

d. Select the best splits for each feature based on the gradients and Hessians.

e. Perform multi-leaf splitting to efficiently create new tree nodes.

f. Update the ensemble by adding the new trees, multiplied by a learning rate
(shrinkage).

Repeat steps 3 until the desired number of boosting iterations is reached or


convergence criteria are met.

Compute the final predictions by summing the predictions of all models in the
ensemble.

HGB provides fast training and prediction times while delivering competitive
performance on a wide range of datasets.

Pseudocode for HistGradient Boosting Algorithm:

Input:

- Training data D

- Number of base models N

- Maximum tree depth max_depth

- Number of histogram bins K

Output:

- Ensemble model

Procedure HistGradientBoosting(D, N, max_depth, K):

Initialize the predicted values y_hat as 0 for all training samples in D


58
Initialize an empty ensemble model

For i = 1 to N:

Compute the negative gradient vector r_i for each training sample in D

Construct a histogram representation of the dataset D based on K bins

Train a base model M_i on the histogram representation of D using the


negative gradient vector r_i

Compute the predicted values for the base model M_i

Update the predicted values y_hat by adding the predictions of M_i

Add M_i to the ensemble model

Return the ensemble model

Application of HistGradient Boosting Algorithm to the Climate Change


Dataset

In the pseudocode, the HistGradient Boosting algorithm takes the NOAA training
dataset, the number of base models N, the maximum tree depth max_depth, and the
number of histogram bins K as input. It initializes the predicted values y_hat as 0
for all training samples in the dataset. It also initializes an empty ensemble model.
Then, it iterates N times. In each iteration, it computes the negative gradient vector
for each training sample in the dataset. It constructs a histogram based on K bins.
A base model is trained on the histogram representation of the dataset using the
negative gradient vector. The predicted values for the base model are computed,
and then the predicted values y_hat are updated by adding the predictions of the
base model. The base model is added to the ensemble model. The process is
repeated N times. Finally, the ensemble model is returned as the output.

59
3.9 Performance Evaluation Metrics

3.9.1 Root Mean Square Error (RMSE)

The Root Mean Square Error (RMSE) stands as a frequently employed metric
within the realm of regression analysis. Its purpose lies in assessing the disparity
between the projected values and the factual values of the dependent variable.
RMSE computation entails the derivation of the square root from the average of the
squared disparities that exist between the predicted and actual values. The RMSE
formula is as follows:

The RMSE (Root Mean Square Error) can be expressed as follows:

RMSE = sqrt((1/n) * sum((y_pred - y_actual)^2))

Here, y_pred represents the predicted values, y_actual denotes the actual values,
and n stands for the total number of observations. RMSE serves as a valuable metric
as it provides insight into the extent of disparities between predictions and actual
values. A smaller RMSE value signifies a higher level of accuracy in the model's
predictions.

3.9.2 Mean Square Error (MSE)

Mean Square Error (MSE) serves as another frequently employed metric in


regression analysis. It quantifies the average of the squared discrepancies between
the predicted and actual values of the dependent variable. The MSE formula is as
follows:

MSE = (1/n) * sum((y_pred - y_actual)^2)


Here, y_pred signifies the predicted value, y_actual represents the
actual value, and n denotes the total number of observations. MSE proves valuable

60
as it offers insight into the typical magnitude of errors made by the model. A lower
MSE value signifies enhanced model accuracy.

3.9.3 R-squared(R2)

R-squared is a statistical gauge that signifies the proportion of the variance


observed in the dependent variable, which can be elucidated by the independent
variables within a regression model. This metric is also referred to as the coefficient
of determination. R-squared ranges from 0 to 1, with higher values indicating a
more favorable model fit. The formula for R-squared is as follows:

R-squared = 1 - (sum((y_pred - y_actual)^2)/sum((y_actual - y_mean)^2))

In this equation, y_pred represents the predicted value, y_actual is the actual value,
y_mean signifies the mean of the actual values, and the sum is computed across all
observations. R-squared proves to be valuable as it provides an understanding of
how well the model conforms to the data. A higher R-squared value suggests that
the model accounts for a greater amount of variance in the data. However,it is
imperative to note that a high R-squared value does not necessarily imply that the
model is adept at forecasting future outcomes.

61
CHAPTER FOUR

IMPLEMENTATION, RESULTS AND DISCUSSION

4.1 Introduction

In Chapter Three, various aspects of data collection, data preprocessing, feature


extraction, feature engineering, data segmentation, pseudocode of algorithms, and
evaluation metrics were discussed. In this chapter, the focus will be on conducting
a comparative analysis of the different classes of machine learning algorithms. The
goal of this analysis is to evaluate the performance and effectiveness of different
algorithms in solving the problem at hand. This chapter will provide a
comprehensive overview of the selected algorithms, and evaluation of the
performance of the selected algorithms used in this study.

4.2 Planning Stage

During this stage, various journals and books were read for gaining a better insight
of the subject matter, identify similarities and differences in methods, and identify
a problem that could be solved. This stage also included setting up the coding
environment, learning which Integrated Development Environment (IDE) to use,
installation procedures, packages and libraries required, and best coding practices
for the implementation.

4.3 Development Tools

This study's computer system is an HP ProBook6460b with 6 gigabytes (GB) of


random access memory (RAM). Windows was used as the operating system, and
an Intel Core i5 processor with a processing speed of 2.53GHz was used.

62
4.3.1 Programming Language

Python was used to carry out the experiment. Python is an interpreted, maximum-
level, general-purpose programming language that is used in a variety of
programming contexts. Because of its extensive library and resource set, it is the
most used languages in data science. It is also preferred because of its ability to
process large amounts of data in various formats in a short frame of time. Python
is thought to easily adapt, it includes components for implementing visualization
and graphics.

4.3.2 Libraries Used


The major python libraries applied in this research were Pandas, Smote, Numpy,
LightGBM, Xgboost and Scikit Learn.
Pandas: Pandas is a well-known Python data science tool. It gives powerful, and
adaptable structures of data for analyzing and manipulating data. One of these
structures is the Data Frame. It was added to facilitate data manipulation.
Scikit-Learn: Scikit-Learn is a well-known Python module that is mostly used for
Machine Learning. It's popular since it comes with predefined classes that may be
used for model training at any moment. All machine learning algorithms used in
this study excluding Xgboost, LightGBM and CatBoost algorithm are implemented
as Python classes in this package. Scikit Learn implements learning and prediction
algorithms, maintains data-driven information, trains a model on the data, and
evaluates the model's performance. All of this was obtained with just a few calls to
various methods.
Xgboost Library: The Xgboost Library is a python library built upon Boosting
ensemble methods. It does not come with the scikit learn module so it is imported.
Xgboost was imported to have access to its classifier in order to build its model.
Lightgbm Library: The LightGBM (Light Gradient Boosting Machine) Library is
an open-source machine learning framework developed by Microsoft. It is designed

63
to provide efficient and high-performance implementations of gradient boosting
algorithms.
CatBoost Library: The CatBoost Library is a powerful open-source machine
learning library developed by Yandex. It specializes in gradient boosting on
decision trees and is designed to provide high-quality results with minimal data
preprocessing.

4.3.3 Integrated Development Environment (IDE)

The research was conducted within the Google Colab environment, which is an
online, cloud-based Jupyter notebook platform. Google Colaboratory offers free
access for machine learning and deep learning model development, utilizing CPUs,
GPUs, and TPUs. It facilitated the translation of Python source code into its
corresponding machine code.

Figure 4.1: Google Colaboratory Environment

64
4.4 Exploratory Data Analysis of Climate Change Forecasting

Visualization of Time Series Data

Figure 4.2: Time Series Data

65
Visualization of Monthly NNME monthly forecasts for Precipitation

Figure 4.3: monthly forecast for Precipitation

Visualization of Categorical Columns

66
Figure 4.4: Categorical Columns

Visualization of Some Numerical Columns

67
Figure 4.5: Latitude

Figure 4.6: Longitude

4.5 Results of training and testing process

The figures below show the results of processes carried out on classifiers in the
Google Colab Interface

68
4.5.1 Bagging Algorithms:

Extra Trees

Figure 4.7: Evaluation of the Extra Tree Algorithm

Random Forest

Figure 4.8: Evaluation of the Random Forest Algorithm

69
4.5.2 Boosting Algorithms:

Xgboost

Figure 4.9: Evaluation of Xgboost Algorithm


LightGBM

Figure 4.10: Evaluation of LightGBM Algorithm

70
CatBoost

Figure 4.11: Evaluation of CatBoost Algorithm

71
4.5.3 Forest of Randomized Trees:
Decision Trees

Figure 4.12: Evaluation of Decision Tree Algorithm


HistGradient Boosting

Figure 4.13: Evaluation of the HistGradient Algorithm

72
4.6 Result Analysis

The experimental results of all classifiers are closely compared to identify


similarities and differences in model predictions. RMSE, MSE, and R2-Score are
used to represent them.

Table 4.1: Comparison of the classes of algorithms


Classes Algorithms RMSE MSE R2-Score
Boosting CatBoost 0.1643 0.0269 0.9997
Boosting LightGBM 0.2385 0.0569 0.9994
Boosting Xgboost 0.9409 0.8853 0.9909
Bagging Random Forest 2.0365 4.1475 0.9586
Bagging Extra Trees 2.2200 4.9286 0.9508
Forest of HistGradient 2.6027 6.7739 0.9324
Randomized Boosting
Trees
Forest of Decision Tree 2.6796 7.1800 0.9283
Randomized
Trees

73
Among the various algorithms assessed for the regression task, the boosting
algorithms, specifically CatBoost, LightGBM, and Xgboost, displayed exceptional
performance according to the evaluation metrics of Root Mean Squared Error
(RMSE), Mean Squared Error (MSE), and R-squared (R2-Score). CatBoost
achieved the lowest RMSE of 0.1643 and MSE of 0.0269, signifying its capability
to minimize the average discrepancy between predicted and actual values.
Moreover, it attained an outstanding R2-Score of 0.9997, indicating that the model's
predictions can elucidate a substantial portion of the variance in the target variable.
While LightGBM and Xgboost also demonstrated strong performance, with
relatively low RMSE and MSE values, CatBoost surpassed them in all three
metrics. The higher R2-Score achieved by CatBoost further corroborates its
effectiveness in capturing the underlying data patterns. Based on these findings, it
can be inferred that boosting algorithms, particularly CatBoost, are highly suitable
for this regression task. The proficiency of boosting algorithms in amalgamating
weak learners into a robust ensemble, adaptively focusing on challenging instances,
and addressing intricate relationships within the data likely contributes to their
superior performance. It is noteworthy that bagging algorithms like Random Forest
and Extra Trees, as well as the forest of randomized trees algorithm, HistGradient
Boosting, and the standalone Decision Tree, also exhibited reasonably good
performance. However, their RMSE, MSE, and R2-Scores were comparatively
higher than those of the boosting algorithms.

Overall, the results strongly suggest that CatBoost, due to its impressive
performance across all evaluation metrics, should be considered as the preferred
choice when applying boosting algorithms to regression tasks similar to the one
under consideration.

74
4.7 Discussion of Findings

In this study, we conducted a comparative analysis of various boosting and bagging


algorithms for regression tasks. The evaluation was based on three commonly used
performance metrics: Root Mean Squared Error (RMSE), Mean Squared Error
(MSE), and R-squared (R2-Score). The purpose was to identify the most effective
algorithm for the given regression problem.

4.7.1 Boosting Algorithms:

Among the evaluated algorithms, the boosting algorithms, namely CatBoost,


LightGBM, and Xgboost, exhibited superior performance. Boosting algorithms are
ensemble methods that combine multiple weak learners to form a stronger model.
These algorithms iteratively focus on difficult instances and assign higher weights
to misclassified samples, allowing them to handle complex relationships within the
data.

4.7.1.1 CatBoost:

CatBoost stood out as the best performer in terms of all evaluation criteria. It accomplished
the lowest RMSE of 0.1643, demonstrating its capacity to reduce the average disparity
between predicted and actual values. The minimal MSE of 0.0269 additionally validates
its precision in forecasting. Furthermore, CatBoost achieved an outstanding R2-Score of
0.9997, indicating that most of the variability in the target variable can be elucidated by
the model's predictions. This implies that CatBoost adeptly grasps the fundamental data
patterns and delivers exceedingly precise regression outcomes.

75
4.7.2 Comparison with Other Boosting Algorithms:

While both LightGBM and Xgboost also demonstrated strong performance,


CatBoost outperformed them in all three evaluation metrics. LightGBM achieved
slightly higher RMSE and MSE values compared to CatBoost, indicating slightly
higher prediction errors. Xgboost had significantly higher RMSE and MSE values,
suggesting a less accurate regression model compared to CatBoost and LightGBM.
The R2-Scores of both LightGBM and Xgboost were also slightly lower than that
of CatBoost, indicating a lesser ability to explain the variance in the target variable.

4.7.3 Comparison with Bagging Algorithms and Randomized Trees:

Although the bagging algorithms, Random Forest and Extra Trees, along with the
forest of randomized trees algorithm, HistGradient Boosting, delivered reasonably
good performance, their results were comparatively lower than those of the
boosting algorithms. These algorithms generate multiple models and aggregate
their predictions to make a final prediction. However, they were outperformed by
the boosting algorithms in terms of RMSE, MSE, and R2-Score. This suggests that
for the given regression task, the boosting algorithms' ability to focus on
challenging instances and capture complex relationships played a crucial role in
achieving superior results.

4.8 Conclusion

In conclusion, this comparative analysis of machine learning algorithms for climate


change forecasting revealed that Gradient Boosting achieved the highest
performance across multiple evaluation metrics. The results imply that Gradient
Boosting is a promising algorithm for accurate climate change predictions. Based
on the outcomes of this study, future research directions could involve exploring

76
hybrid models that combine the strengths of ensemble methods with deep learning
architectures to further enhance climate change predictions. Additionally,
integrating socio-economic factors and policy interventions into the forecasting
models can improve their real-world applicability and support decision-making
processes.

77
CHAPTER FIVE

SUMMARY, CONCLUSION AND RECOMMENDATIONS

5.1 Summary

The primary objective of the study was to evaluate and compare the performance
of various machine learning algorithms for climate change forecasting. The
CatBoost, LightGBM, XGBoost, Random Forest, Extra Trees, HistGradient
Boosting, and Decision Tree algorithms were utilized in the study analysis. After
conducting rigorous experiments and evaluating the results, it was discovered that
CatBoost outperformed all other algorithms in terms of predictive accuracy and
generalization capabilities. It was closely followed by LightGBM and XGBoost,
which also demonstrated strong performance. Random Forest and Extra Trees
exhibited moderate performance, while HistGradient Boosting and Decision Trees
showed relatively lower predictive accuracy. A range of evaluation metrics were
used to assess the performance of the algorithms, including mean absolute error,
mean squared error, and R-squared. The evaluation was conducted on a
comprehensive dataset of climate change indicators, considering both temporal and
spatial aspects. The analysis revealed that CatBoost consistently outperformed
other algorithms across different evaluation metrics, demonstrating its robustness
and effectiveness in climate change forecasting.

5.2 Limitations

While this study has provided valuable insights, it is essential to acknowledge its
limitations:

1. Computational Resources: Running the algorithms on large climate datasets


required substantial computational resources, including powerful hardware and

78
memory. This study had to source for good computational resources for
implementation of the analysis.

2. Feature Engineering: Climate data often requires domain-specific feature


engineering, which can be challenging without a deep understanding of the climate
science field. This study had to carry out research on the climate science field for
better understanding to better carry out feature engineering.

3. Data Quality: The accuracy of predictions heavily depends on the quality of the
data you have. Climate data can be noisy, incomplete, or have biases. Ensuring data
quality is a significant challenge. The study had to make sure quality data was used.

4. Algorithm Selection: The choice of algorithms is essential, but it might not cover
all possible algorithms suitable for climate forecasting. There could be newer
algorithms that are more effective. This study had to try out different algorithms
and selected the best algorithms for the analysis.

5.3 Recommendations

In light of the discoveries made in this investigation, the following suggestions are
put forth for future research and the utilization of machine learning algorithms in
climate change prediction:
1. CatBoost should be considered as a primary choice for climate change
forecasting tasks due to its outstanding performance. Further research can
focus on understanding the specific features and techniques that contribute
to its superior predictive accuracy.
2. LightGBM and XGBoost can serve as alternative options for climate change
forecasting, especially when computational efficiency and scalability are
crucial factors.

79
3. Random Forest and Extra Trees can be utilized when a balance between
accuracy and computational efficiency is required. Further investigation can
explore techniques to enhance their performance in climate change
forecasting.
4. HistGradient Boosting and Decision Trees may be suitable for preliminary
analysis or when interpretability of the model is a priority. Research efforts
should be directed towards improving their accuracy and addressing their
limitations in the context of climate change forecasting.
5. Future research should focus on incorporating additional features and
variables into the models to further enhance their predictive capabilities.
The inclusion of more diverse climate indicators, geographical factors, and
temporal patterns could potentially improve the accuracy of the forecasting
models.
6. Ensembling techniques that combine the strengths of multiple algorithms
can be explored to boost the overall predictive accuracy in climate change
forecasting. Techniques such as stacking or blending different algorithms
can help exploit their complementary strengths.
7. Continuous monitoring and updating of the models should be ensured to
accommodate the dynamic nature of climate change. Regular retraining of
the models with new data can help maintain their accuracy and adaptability
to evolving climate patterns.

The recommendations emphasize the need for further research and exploration in
several areas. This includes understanding the underlying features and techniques
that contribute to CatBoost's superior performance, investigating ways to enhance
the accuracy of Random Forest and Extra Trees algorithms, and improving the
performance of HistGradient Boosting and Decision Trees for climate change
forecasting. Furthermore, future research should emphasize the incorporation of a

80
more diverse set of features and variables into the models, taking into account
geographical factors, temporal trends, and a broader array of climate indicators.
Exploring ensembling techniques that combine multiple algorithms can also be
beneficial, as they can harness the complementary strengths of these methods to
enhance overall predictive accuracy.
To ensure the efficiency of the forecasting models, continuous monitoring and
updating are imperative. Climate change is an evolving process, and regularly
retraining the models with new data is essential to uphold their accuracy and
adaptability to evolving climate patterns.
In summary, this study has made a valuable contribution to the field of climate
change forecasting by conducting a comprehensive evaluation and comparison of
various machine learning algorithms. The results underscore the exceptional
performance of CatBoost while shedding light on the strengths and weaknesses of
other algorithms. The recommendations outlined in this section are intended to
guide future research and the practical application of machine learning algorithms
in climate change forecasting, with the ultimate goal of enhancing our
comprehension and prediction of the impacts of climate change.

81
REFERENCES

Abdulrahman, S. A., Al-Khudairi, M. A., & Alabdulkarim, A. M. (2020).


Predicting the Impact of Climate Change on Agriculture using Machine
Learning Techniques: A Review. Environmental Sciences and Pollution
Research, 27(33), 41409-41419.
Akinfaderin, A., Bello, O. A., & Akinwale, O. (2019). A machine learning
approach for climate prediction: A case study of the Sahelian region.
Environmental Research Letters, 14(5), 054015.
Ali, S. S., Ahmed, S. S., & Arif, A. A. (2020). Prediction of Global Temperature
Using Machine Learning. Journal of Environmental Management, 260,
110029.
Alirezazadeh, N., Moradi, M. H., Javaheri, M., Nikbakht, M., & Pirasteh, S. (2021).
Machine Learning Methods for Solar Radiation Prediction: A
Comprehensive Review. Renewable and Sustainable Energy Reviews, 138,
110540.
Boucher, P. (2020). STUDY Panel for the Future of Science and Technology.
European Parliamentary Research Service, 1-4.
Brouwer, F., et al. (2019). Using Machine Learning to Predict Climate Change
Impacts on Water Resources in Southern Africa. Environmental Research
Letters, 14(9), 094003.
Cianconi, P., Betrò, S., & Janiri, L. (2020). The Impact of Climate Change on
Mental Health: A Systematic Descriptive Review. Environmental Research,
192, 110301.
Clarke, B., Otto, F., Stuart-Smith, R., & Harrington, L. (2022). Extreme weather
impacts of climate change: An attribution perspective. Weather, 77(1), 12-
18.
Cordano, E., Bovolo, F., & Bruzzone, L. (2021). Ensemble Machine Learning for
Prediction of Climate Change Impacts on Wildfire Occurrence.
Environmental Modelling & Software, 141, 105023.
Ebi, K. L., Vanos, J., Baldwin, J. W., Bell, J. E., Hondula, D. M., Errett, N. A.,
Hayes, Reid, C. E., Saha, S., Spector, J., & Berry, P. (2021). Extreme
Weather and Climate Change: Population Health and Health System
Implications. American Journal of Preventive Medicine, 61(1), 115-125.

82
Garg, H., & Garg, N. (2019). Predicting climate change using machine learning
techniques. Journal of Physics: Conference Series, 1331(1), 012015.
Jain, G., & Mallick, B. (2020). A Review on Weather Forecasting Techniques.
Journal of Atmospheric and Solar-Terrestrial Physics, 208, 105336.
Kumar, A., Dubey, S. K., Singh, P. K., & Packirisamy, M. (2020). Machine
Learning Techniques for Climate Change Detection and Mitigation: A
Comprehensive Survey. Environmental Modeling & Assessment, 25(4),
425-441.
Lassalle, R., et al. (2020). Using Machine Learning to Predict the Effects of Climate
Change on the Spread of Invasive Species. Environmental Modelling &
Software, 133, 104844.
Li, J., Zhang, Y., & Chen, Q. (2021). A comparative study of machine learning
algorithms for climate change prediction. Environmental Science and
Pollution Research, 28(7), 8830-8844.
Liu, Y., Wu, H., Dong, M., et al. (2019). Predicting Climate Change using Deep
Learning. Journal of Cleaner Production, 240, 118112.
Ma, Y., et al. (2018). A Machine Learning Approach to Predicting Global Climate
Change. Nature Communications, 9(1), 1-7.
Malhi, G. S., Kaur, M., & Kaushik, P. (2021). Impact of Climate Change on
Agriculture and Its Mitigation Strategies: A Review. Environmental
Science and Pollution Research International, 28(13), 15768-15786.
Nkwam Nkwam, I. H., Barria, J. M., & Diaz, R. J. (2020). A machine learning
approach to predict global temperature anomalies. Environmental Modeling
& Assessment, 25(4), 447-461.
Olaiya, F., & Adeyemo, A. B. (2012). Application of Data Mining Techniques in
Weather Prediction and Climate Change Studies. Journal of Emerging
Trends in Computing and Information Sciences, 3(8), 1091-1096.
Purushothaman, K., & Arulsamy, M. L. (2020). Deep learning approaches for
climate prediction: A review. Journal of Ambient Intelligence and
Humanized Computing, 11(6), 2225-2238.
Rahman, M. I., Islam, M. Z., & Akhand, M. A. H. (2018). A Machine Learning
Approach to Climate Change Prediction. IOP Conference Series: Earth and
Environmental Science, 116(1), 012056.

83
Rahnavard, N., Memarzadeh, N., & Shafiee, M. (2019). Machine Learning
Techniques for Climate Prediction: A Review. Journal of Hydrology, 575,
215-233.
Ravi, S., Raghavendra, S., & Prabhu, S. (2020). Machine learning for climate
change: A review. Renewable and Sustainable Energy Reviews, 123,
109723.
Robinson, N. L., Zanna, L., & Jahn, O. (2019). Machine Learning to Predict
Climate Change Impacts on the Oceans. Earth's Future, 7(5), 547-567.
Sahu, M., Pandey, S., & Kumar, A. (2020). Climate Prediction using Machine
Learning: A Review. Journal of Earth System Science, 129(5), 107.
Sharma, A., Gupta, M., Singh, S., & Patel, N. (2019). A machine learning approach
to climate change prediction. Journal of Ambient Intelligence and
Humanized Computing, 10(9), 3611-3623.
Shen, X., Chi, X., Liu, J., Zhang, X., & Zheng, F. (2018). A machine learning
approach to climate change prediction. Environmental Science and
Pollution Research, 25(17), 16413-16423.
Shivanna, K. R. (2022). Climate change and its impact on biodiversity and human
welfare. Journal of Biosciences, 47(1), 1-8.
Tabari, H. (2020). Climate change impact on flood and extreme precipitation
increases with water availability. Scientific Reports, 10(1), 1-13.
Wainwright, C. M., Finney, D. L., Kilavi, M., Black, E., & Marsham, J. H. (2020).
Extreme rainfall in East Africa, October 2019–January 2020 and context
under future climate change. Weather and Climate Extremes, 30, 100294.
Zaman, M. S., Islam, M. R., Akhand, M. A. H., et al. (2018). A Machine Learning
Approach to Climate Prediction: A Case Study of the Indian Journal of
Engineering and Applied Sciences, 13(10), 7649-7656.
Zhu, Y., Zhang, C., & Wang, J. (2020). Deep Learning for Climate Change
Detection and Analysis. Environmental Research Letters, 15(12), 124003.

84

You might also like