Research Papers of Rainfall Ptediction
Research Papers of Rainfall Ptediction
net/publication/385373989
CITATIONS READS
0 52
6 authors, including:
All content following this page was uploaded by Nuthanakanti Bhaskar on 01 November 2024.
Abstract— Many important fields rely on accurate rainfall complex interactions between atmospheric variables and
predictions, including agriculture, water resource management, rainfall patterns, resulting in more accurate predictions than
and emergency preparation. The many nonlinear interactions older methods [1].
included in weather data are notoriously difficult for traditional
approaches to grasp. Agriculture, water resource management, However, adopting machine learning for rainfall
and disaster preparation are just a few of the many sectors that prediction presents challenges such as the need for high-
might benefit greatly from accurate rainfall predictions. quality, extensive datasets, overcoming data biases, and
Because meteorological data is both complicated and dynamic, understanding complex model outputs. Additionally,
traditional methods often get it wrong. Recent years have seen establishing the reliability and robustness of machine
encouraging results in the use of machine learning (ML) learning-based predictions requires rigorous validation and
approaches, especially ensemble models, to enhance the verification.
precision and consistency of precipitation forecasts. This
research provides a new method for predicting precipitation Ensemble modelling has gained popularity in meteorology
using an ensemble of ML models. The ensemble model EERP and climate science due to its ability to capture a greater
algorithm combines the predictions of numerous base learners variety of probable outcomes while reducing uncertainty
to create a more reliable and accurate forecast. Decision trees, inherent in single-model projections [2]. Ensemble modelling
random forests, gradient boosting machines, neural networks, strategies can use various methodologies, such as aggregating
and other ML approaches serve as the ensemble's base learners. projections from numerous numerical weather prediction
Assess the effectiveness of the proposed approach and then models, including many data sources, and integrating different
compare it with some existing classifiers in which the proposed machine learning algorithms.
classifier gave the best accuracy when compared with the other
i.e., 96 %. Key benefits of ensemble modelling for rainfall prediction
include improved reliability, encompassing a greater variety
Keywords— Rainfall, ML, Ensembled Model, Forecasting, of possible weather scenarios, quantifying uncertainty
Water Resource Management associated with rainfall projections, and being adaptable,
allowing for the integration of new data sources, model
I. INTRODUCTION upgrades, and advances in rainfall prediction [3]. Overall,
Rainfall predictions are crucial in various fields such as ensemble modeling offers a more accurate and reliable
agriculture, water resource management, urban planning, and approach to forecasting rainfall patterns, enhancing decision-
disaster preparation. Meteorologists and climatologists use making in various industries.
various instruments and techniques to forecast rainfall,
including numerical weather prediction models, satellite II. RELATED WORK
images, radar data, and statistical methodologies. These Accurately predicting when and how much rain will fall
methods provide short-term forecasts for urgent planning and has far-reaching consequences for many fields, such as
long-term outlooks for seasonal and climate predictions. farming, water management, emergency preparation, and city
However, predicting precipitation remains difficult due to the planning. Accurate and timely predictions can aid in
atmosphere's inherent variability and complexity. mitigating risks and optimizing resource allocation.
Machine learning (ML) algorithms are the backbone of Traditional meteorological methods often struggle to capture
machine learning approaches, enabling computers to learn the complex and dynamic nature of atmospheric phenomena
from data and make predictions without human intervention. due to the nonlinear interactions among various
ML techniques for rainfall prediction offer several benefits, meteorological variables.
including improved accuracy, flexibility, real-time
forecasting, and localized predictions. ML models can capture
Authorized licensed use limited to: National Institute of Technology. Downloaded on October 30,2024 at 05:13:14 UTC from IEEE Xplore. Restrictions apply.
Accurate rainfall prediction is crucial for various fields, ML models with more conventional methods. Ensemble ML
including farming, water management, emergency models in hydrology demonstrated superior performance in
preparation, and city planning. Traditional meteorological capturing hydrological phenomena, including rainfall. Radar
methods struggle to capture the complex and dynamic nature data assimilation for precipitation forecasting was also
of atmospheric phenomena due to nonlinear interactions investigated, and improved rainfall forecasting with GNSS
among various meteorological variables. Machine learning observations was demonstrated by Zhao et al. (2020) [9].
(ML) methods have enabled more precise and trustworthy
weather forecasts in recent years. Ensemble models combine III. PROPOSED METHODOLOGY
multiple base learners, leveraging their collective strengths to A. Data Exploration and Analysis for rainfall prediction
produce more accurate and robust predictions.
Exploring and interpreting data is an important step in
Hybrid model performance in ungauged basins has been creating reliable rainfall prediction models. This method
demonstrated by Dong et al. (2024) [4], who demonstrated entails studying the dataset's characteristics, recognizing
the hybrid model's better performance in real-time hourly patterns, and selecting relevant attributes for modelling.
water level forecasting in real time. Baig et al. (2024) [5] Here's an example of how you could approach data
assessed how well different ML models predicted monthly exploration and analysis for rainfall prediction:
rainfall in very dry regions, highlighting the potential of ML Data Collection: Gather weather records from reliable
techniques to outperform traditional models, especially in sources, such as weather stations or universities. This dataset
challenging climatic conditions. Deep learning for heavy should include variables like precipitation (rainfall),
rainfall prediction in complex terrains was found to temperature, humidity, wind speed, atmospheric pressure, and
significantly enhance prediction accuracy over complex location [10].
terrains. A comparative study of streamflow simulation Data Cleaning: Remove any inconsistencies, missing
methods showed that ensemble approaches provided better values, or outliers from the dataset that may have an impact
performance compared to individual ML models. on analysis quality. Use techniques like mean imputation or
Latif et al. (2023) [6] evaluated several models for interpolation to fill in missing values, and visualization tools
predicting rainfall and highlighted the benefits of combining or statistical methods to find and handle outliers.
ML with remote sensing techniques to improve the accuracy Exploratory data analysis (EDA): Summary Statistics: To
of predictions. AI techniques for rainfall forecasting in better understand the distribution and variability of each
Thailand were examined, and ML approaches were used to variable, compute descriptive statistics such as mean, median,
reconstruct gridded precipitation based on multiple data standard deviation, and range [11].
sources. Hybrid data assimilation and ML methods for
Visualization: Use histograms, box plots, and scatterplots
climate simulations were suggested to enhance regional
to investigate the relationships between variables and uncover
climate models that combine data assimilation with ML. patterns or trends. Consider plotting rainfall distribution over
A literature survey on rainfall prediction using ML time, seasonal trends, or connections between rainfall and
models covered a wide range of algorithms, datasets, and other meteorological data.
performance measures. Hussein et al. (2022) [7] reviewed Temporal Analysis: Analyse temporal patterns in rainfall
rainfall prediction using ML and DL techniques, while data, such as daily, monthly, or seasonal trends, to better
Rahman et al. (2022) [8] created an algorithm for smart city understand seasonal variability and long-term trends.
rainfall prediction using ML fusion methods.
Spatial Analysis: Use spatial analysis to investigate the
ML algorithms were applied to predict rainfall across geographical distribution of rainfall and discover areas with
various ecological zones in Ghana, highlighting the high or low precipitation. Use maps and spatial visualization
effectiveness of ML models in capturing regional rainfall techniques to depict spatial patterns and variations. Figure. 1
patterns. Techniques for predicting Northeast Monsoon shows the architecture of the proposed model [12,13].
rainfall were examined, emphasizing the need to combine
Authorized licensed use limited to: National Institute of Technology. Downloaded on October 30,2024 at 05:13:14 UTC from IEEE Xplore. Restrictions apply.
IV. RESULT AND DISCUSSION
B. Data Pre-processing:
Feature encoding converts categorical variables to A. Dataset
numerical form, while data splitting divides the dataset into Approximately ten years' worth of daily weather
training, validation, and testing parts. Maintaining data observations from different parts of India are included in the
integrity is crucial. Dimensionality reduction can be achieved dataset, which is sourced from Kaggle. The attributes of the
using Principal Component Analysis or feature selection Dataset are shown in Figure 2.
algorithms. Handling imbalanced data, such as rare rainfall
classes, can be addressed using techniques like oversampling, The dataset contains
under-sampling, or appropriate evaluation metrics. These Number of columns: 23
steps ensure the model's effectiveness and maintain data Number of rows: 145460
integrity. Quality Assurance: Perform a final check to ensure Number of Independent Columns: 22
that the pre-processed data is ready for model training [14]. Number of Dependent Columns: 1
This may involve visualizations, statistical analysis, or cross- And the ratio is 75 – 25 with which 75% for training
validation. purposes and 25% for testing.
Authorized licensed use limited to: National Institute of Technology. Downloaded on October 30,2024 at 05:13:14 UTC from IEEE Xplore. Restrictions apply.
C. Feature Selection
Feature selection is a method for developing predictive
models that involve reducing the amount of input variables.
Used the outcomes of the proposed algorithm to guide the
manual feature selection process. Since it needs to make
predictions and look at the attribute "Rain Tomorrow" as our
dependent variable (Y). Except for "Date," "Evaporation," and
"Sunshine," and regarded all other variables as Independent
Variables (X). This is because Date has no bearing on our
model, while Evaporation and Sunshine have quite significant
percentages of missing values.
D. Feature Scaling
In our dataset, the characteristics' values and ranges varied
significantly. However, because many machine learning Fig. 6. Shows the Confusion Matrix for a 2-Class Problem
algorithms rely on Euclidean distance for computations, this
variation presents a problem. Features with higher magnitudes The Confusion Matrix for a 2-Class Problem is shown in
can have a greater influence on distance computations than Fig 6. Based on the values generated in the confusion matrix
those with lower magnitudes. To address this issue, used the precision, recall, and f-score were calculated which are
scaling to bring all features to the same magnitude level and given as
used Scikit-learn's Standard Scaler to scale all data points to a 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
… . (1)
present range, ensuring feature magnitudes were uniform. 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = … … (2)
E. Proposed Algorithm-I 𝑇𝑃+𝐹𝑃
𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇 … … . (3)
Phase -I Algorithm 𝑇𝑃+𝐹𝑁
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑅𝑒𝑐𝑎𝑙𝑙
𝐹 − 𝑆𝑐𝑜𝑟𝑒 = 2 ∗ … … . (4)
Step 1: I/P Data: Historical rainfall data from various 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
sources or weather data including features like The Confusion Matrix for the Proposed Classifier for
temperature, humidity, wind speed, atmospheric pressure, Rainfall Prediction is shown in Table 1 and Accuracy is
and previous rainfall measurements. shown in Fig. 7.
Step 2: O/P: Confusion Matrix
TABLE I. CONFUSION MATRIX FOR PROPOSED CLASSIFIER FOR
Step 3: Remove errors and inconsistencies from the data RAINFALL PREDICTION
Authorized licensed use limited to: National Institute of Technology. Downloaded on October 30,2024 at 05:13:14 UTC from IEEE Xplore. Restrictions apply.
F. Decision Tree Classifier TABLE II. CONFUSION MATRIX FOR DECISION TREE CLASSIFIER FOR
RAINFALL PREDICTION.
One effective machine-learning approach for weather
prediction is the Decision Tree.
Steps to Build a Decision Tree for Rainfall Prediction: Actual Class
Predicted Class
Step 1: Data Collection and Pre-processing:
Collect rainfall records with other pertinent weather 6223 372
data (such as temperature, humidity, and wind speed,
for example).
456 5796
Remove errors and inconsistencies from the data by
dealing with missing values and outliers.
Step 2: Feature Selection:
Identify the features (independent variables) that are
most likely to influence rainfall. These typically
include meteorological parameters such as:
Temperature, Humidity, Wind speed and direction,
Atmospheric pressure, and Previous rainfall amounts.
Optionally, include geographical features if available
(e.g., elevation, proximity to water bodies).
Step 3: Data Splitting:
A training set and a testing set should be formed from
the dataset. The Decision Tree model will be
constructed using the training set, and its performance
will be assessed using the testing set.
Constructing the Model Tree: Use the training data to
train a decision tree model. To ensure that each
branch's rainfall forecasts are as consistent as possible,
the algorithm recursively divides the data according to Fig. 8. Shows Accuracy Obtained Using DT Classifier
the most important variable at each node.
G. Naïve-Bayes:
Optimization of model performance and prevention of
overfitting may be achieved by specifying parameters Assuming naively that predictors are independent, Naive
such as maximum tree depth, minimum samples per Bayes is a probabilistic classifier that relies on Bayes'
leaf, and splitting criteria (e.g., Gini impurity or theorem. It can be adapted for rainfall prediction by treating it
entropy). as a classification problem where the outcome is the presence
or absence of rainfall. Here’s how you can apply Naive Bayes
Step 4: Model Evaluation: for rainfall prediction:
Put the Decision Tree model through its paces on the Step 1: Data Preparation:
test data. Popular measures for assessment include the
of Collect past data that shows the frequency of rainfall
together with independent factors like temperature,
Accuracy: Percentage of correct predictions. humidity, wind speed, atmospheric pressure, etc.
Confusion Matrix: Comparison of real rainfall events Remove errors and inconsistencies from the data by
with forecasted ones. Receiver Operating dealing with missing numbers, outliers, and formatting
Characteristic (ROC) Curve: A graph showing the issues.
ratio of genuine positives to false positives.
Step 2: Feature Selection:
Step 5: Interpretation and Visualization.
Choose relevant meteorological variables that are likely
Visualize the Decision Tree to interpret how different to influence rainfall. These could include Temperature,
variables contribute to rainfall prediction. This helps Humidity, Wind speed and direction, Atmospheric
in understanding the decision-making process of the pressure, and Previous rainfall amounts.
model and identifying which meteorological factors
are most influential. Ensure these variables are numerical and appropriately
scaled for input into the Naive Bayes model.
The Confusion Matrix for Decision Tree Classifier for
Rainfall Prediction is shown in Table. 2 and accuracy is shown Step 3: Data Splitting:
in Fig. 8. A training set and a testing set should be formed from the
dataset. It is common practice to provide a bigger
percentage for training (say, 70–80%) and a smaller
percentage (say, 20–30%) for testing.
Authorized licensed use limited to: National Institute of Technology. Downloaded on October 30,2024 at 05:13:14 UTC from IEEE Xplore. Restrictions apply.
Step 4: Model Training: V. CONCLUSION
Train a Naive Bayes classifier on the training data. Naive Ensemble methods have significantly improved the
Bayes assumes independence among predictors given the reliability and accuracy of rainfall prediction. These methods
class label (rainfall or no rainfall). One kind of Naive use the best features of each forecasting model while
Bayes classifier, known as "Gaussian Naive Bayes," compensating for their worst features, reducing biases and
presupposes that features are normally distributed. errors inherent in individual predictions. The averaging or
Naive Bayes Multinomial: Works well with discrete- combination of forecasts from diverse models leads to more
count categorical features. robust predictions, especially in complex meteorological
conditions. Ensemble algorithms provide more resilient
The Bernoulli Naive Bayes model treats characteristics as predictions against uncertainties in meteorological factors
if they were binary, such as whether a certain condition is that influence rainfall patterns. Ensemble approaches are
present or not. Think about the characteristics of your scalable across different spatial and temporal scales, making
predictor variables to help you choose the right kind.
them suitable for various applications ranging from local
Step 5: Model Evaluation: weather forecasting to regional climate studies shown in Fig.
10.
Make use of the testing dataset to assess the trained Naive
Bayes model. Popular measures for assessment include:
Accuracy: Percentage of correctly classified instances.
Precision and Recall: Measures of model performance for
positive (rainfall) and negative (no rainfall) classes.
F1-score: Bringing the two measures into harmony, the
harmonic mean of recall and accuracy.
The Confusion Matrix for Naïve-Bayes Classifier for
Rainfall Prediction is shown in Table. 3. And Accuracy is
shown in Fig. 9.
TABLE III. CONFUSION MATRIX FOR NAÏVE-BAYES CLASSIFIER FOR
RAINFALL PREDICTION.
Actual Class
Predicted Class
6124 551
Fig. 10. Comparison of 3 Classifiers
Authorized licensed use limited to: National Institute of Technology. Downloaded on October 30,2024 at 05:13:14 UTC from IEEE Xplore. Restrictions apply.
[5] Baig, Faisal, et al. "How accurate are the machine learning models in
improving monthly rainfall prediction in hyper arid environment?."
Journal of Hydrology 633 (2024): 131040.
[6] Latif, Sarmad Dashti, et al. "Assessing rainfall prediction models:
Exploring the advantages of machine learning and remote sensing
approaches." Alexandria Engineering Journal 82 (2023): 16-25.
[7] Hussein, Eslam A., et al. "Rainfall prediction using machine learning
models: literature survey." Artificial Intelligence for Data Science in
Theory and Practice (2022): 75-108.
[8] Rahman, Atta-ur, et al. "Rainfall prediction system using machine
learning fusion for smart cities." Sensors 22.9 (2022): 3504.
[9] Zhao, Qingzhi, et al. "An improved rainfall forecasting model based on
GNSS observations." IEEE Transactions on Geoscience and Remote
Sensing 58.7 (2020): 4891-4900.
[10] Kobayashi, Kenichiro, et al. "Ensemble flood simulation for a small
dam catchment in Japan using 10 and 2 km resolution nonhydrostatic
model rainfalls." Natural Hazards and Earth System Sciences 16.8
(2016): 1821-1839.
[11] Schmitz, Gerd H., and Johannes Cullmann. "PAI-OFF: A new proposal
for online flood forecasting in flash flood prone catchments." Journal
of hydrology 360.1-4 (2008): 1-14.
[12] Huang, Ganji, and Lingzhi Wang. "Hybrid neural network models for
hydrologic time series forecasting based on genetic algorithm." 2011
fourth international joint conference on computational sciences and
optimization. IEEE, 2011.
[13] Han, Feng, et al. "Automated Extraction of Rail Point Clouds by Multi-
Scale Dimensional Features From MLS Data." IEEE Access 11 (2023):
32427-32436.
[14] Papalaskaris, Thomas, Theologos Panagiotidis, and Athanasios
Pantrakis. "Stochastic monthly rainfall time series analysis, modeling
and forecasting in Kavala City, Greece, North-Eastern Mediterranean
Basin." Procedia engineering 162 (2016): 254-263.
[15] Caraka, Rezzy Eko, and Sakhinah Abu Bakar. "Evaluation
Performance of Hybrid Localized Multi Kernel SVR (LMKSVR) in
electrical load data using 4 different optimizations." J. Eng. Appl. Sci
13.17 (2018): 7440-7449.
Authorized licensed use limited to: National Institute of Technology. Downloaded on October 30,2024 at 05:13:14 UTC from IEEE Xplore. Restrictions apply.