0% found this document useful (0 votes)
35 views29 pages

Development of Stacking Algorithm For Bias-Correct

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views29 pages

Development of Stacking Algorithm For Bias-Correct

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Theoretical and Applied Climatology (2025) 156:129

https://doi.org/10.1007/s00704-024-05321-x

RESEARCH

Development of stacking algorithm for bias‑correcting


the precipitation projections using a multi‑model ensemble of CMIP6
GCMs in a semi‑arid basin, India
Hemanandhini Shanmugam1 · Vignesh Rajkumar Lakshmanan1

Received: 4 September 2024 / Accepted: 17 December 2024 / Published online: 23 January 2025
© The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2025

Abstract
Climate change affects the hydrological cycle, leading to extreme events such as droughts and floods. Projection of climate
change is necessary to understand the variability of future climate parameters for mitigating the impacts of climate change. The
research aims to project the future precipitation over Amaravathi River Basin (ARB), Tamil Nadu, India considering as MME
(Multi-Model Ensemble) CMIP6 (Coupled Model Inter comparison Project Phase-6) GCMs (General Circulation Models).
The uncertainties and biases in the MME CMIP6 GCM precipitation were corrected and projected using the Empirical Quan-
tile Mapping (EQM) employing the individual multiple Machine Learning (ML) and integrating algorithms through Stacking
Regression (SR). Multiple machine learning algorithms used for bias-correction are Linear Regression (LR), Decision-Tree
(DT) Regression, Random Forest (RF) Regression, Support-Vector Machine (SVM) Regression and Multi-Layer Perceptron
(MLP) Regression with HyperParameter Tuning (HPT). Each machine learning algorithm with optimized hyperparameter was
integrated into the SR to improve the model performance. The proposed SR showed better than the individual algorithms, with
a RMSE (Root Mean Square Error) ranging from 37.14 to 66.28. The SR-based precipitation projection changes were analyzed
as three periods: 2025–2050 (2040, near-future year), 2051–2075 (2065, mid-future year) and 2076–2100 (2090, far-future year)
under SSP (Shared Socioeconomic Pathway) 1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 emission scenarios. The projected annual
precipitation variations are in the range of 0.81–67.33% under the SSP1, followed by -4.51–72.13% (SSP2), -1.62–60.84%
(SSP3) and − 0.71–65.75% under the SSP5 over the ARB. The precipitation was projected to be higher in magnitude in the
southeast and lesser magnitude in the top northern part of ARB. The projection findings will be helpful in formulating strategies
for addressing the climate impact and achieving the Sustainable Development Goal (SDG 13: Climate Action).

1 Introduction will help to reduce and mitigate the negative impacts of cli-
mate variability, minimize the flood risk and manage water
Globally, Climate change is a major issue in all kinds of resources (Prathom and Champrasert 2023; Xu et al. 2021).
environment and ecosystem. The pattern of shifting the Recent studies have underscored the significance of project-
hydro meteorological parameters are the major impact due ing future climate parameters at the regional and sub basin
to climate change (Cook et al. 2020). Climate change leads levels, guiding decision-makers to formulate specific regula-
to more extreme events such as flood and droughts which tions and policies based on impact assessments (Sulaiman
increases the need for food security, flood monitoring and et al. 2022). This type of research addresses the possibility of
environmental monitoring at the regional level (Schilling climate change impact and paves the way for achieving the
et al. 2020). Therefore, understanding and addressing the Sustainable Development Goal (SDG 13: Climate Action).
extreme events through projection of future climate changes Global Climate Models or General Circulation Models
(GCMs) are mathematical models that simulate grid based
climate parameters with coarse resolution all over the world
* Vignesh Rajkumar Lakshmanan and are used to project the future climate parameters related
[email protected] to regional climate variables or local level climate variables.
1 The WCRP (World Climate Research Programme), coupled
Department of Environmental and Water Resources
Engineering, School of Civil Engineering, Vellore Institute with CMIP ( Coupled Model Intercomparision Project),
of Technology, Vellore, Tamil Nadu, India developed many global climate models provided the data set

Vol.:(0123456789)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 2 of 28 H. Shanmugam, V. R. Lakshmanan

as past, present and future climate parameters under various ensemble learning like Multiple Linear Regression(MLR),
emission scenarios (Kim et al. 2020). The CMIP has evolved Support Vector Machine (SVM), Extra-Tree Regression
in six stages : CMIP (1995), CMIP2 (1997), CMIP3 (2004), (ETR), Random Forest(RF) and Long-Short Term Mem-
CMIP4 (2005), CMIP5 (2013) and CMIP6 interface, which ory (LSTM). LSTM provided the most effective simula-
started in 2020 and is still being updated. The implementa- tion results compared to all other machine learning models
tions depend on the current global warming concentration in the NEX-GDDP (NASA Earth Exchange Global Daily
with finer spatial resolution (Eyring et al. 2016). Recently, Downscaled Projection) with CMIP6 data set for precipi-
many researchers used the CMIP6 dataset for projecting tation (Jose et al. 2022). Few researchers, Developed a
future climate parameters (Buhay Bucton et al. 2022; Cook estimation of precipitation product using deep learning
et al. 2020; Gumus et al. 2023; Guven 2023; Mukheef et al. techniques like Neural Networks provided better results
2024; Prathom and Champrasert 2023; Rhymee et al. 2022; than machine learning algorithms (Kolluru et al. 2020;
Seker and Gumus 2022; Xu et al. 2021). CMIP6 model struc- Wehbe et al. 2020). AI based Model Ensemble learning
ture was significantly improved, even though GCMs have climate projection results were enhanced by using the
many biases and uncertainities in the regional-level projec- Extreme Learning Machine (ELM) and Multiple Linear
tion (Eyring et al. 2016; Rahimi et al. 2021). In order to mini- Regression(MLR). ELM showed better results compared
mise the biases and uncertainities, the researchers developed to other ME algorithms (Acharya et al. 2014). Projection
statistical downscaling and dynamical downscaling. Most of monthly precipitation and temperature were through
scientists used statistical downscaling for projection because the utilisation of Multi-Model Ensemble (MME) methods
it has advantages like low computational efficiency and low like combining the Random Forest(RF), Support-Vector
maintainence cost (Fan et al. 2021; Gebrechorkos et al. 2019; Machine (SVM), Bayesian Model Averaging (BMA) and
Me et al. 2022; Rhymee et al. 2022). Statistical downscaling the Arithmetic Ensemble Mean (AEM). Among all the
techniques have been improved through Empirical Quantile algorithms RF provided the good result with lowest error
Mapping (EQM), Quantile Delta Mapping (QDM), Linear percentage(Wang et al. 2018). Recently, a researcher stud-
Scaling(LS), Power transformation (PT) etc. (Ballarin et al. ied fusion based on model ensemble learning methods for
2023; Daniel 2023; Rahimi et al. 2021; Rettie et al. 2023; determining future precipitation changes. The base model
Schoof and Robeson 2016; Shrestha and Pradhanang 2022). learning algorithms were Random Forest (RF), K-nearest
Among the research community, accurate regional-level neighbors (KNN), Extra Tree (ET) and Gradient Boosting
projection of precipitation and temperature is challenging Decision Tree (GBDT). Fusion-based Multiple Ensemble
through the advanced bias correction techniques. In recent algorithm provided the highest Taylor skill score compared
years, Best ranked GCM (i.e., highly correlated GCM related to the Individual ME algorithm in Hanjiang River Basin,
to an observed station) was determined using different meth- China (Wang et al. 2023). From the overall literature, Each
ods such as compromise programming, Taylors diagrams to machine learning algorithm performed well at different
reduce biases and uncertainities and to increase the reliablity datasets and different spatial references however, had some
of climate projections (Ashfaq et al. 2022; Deepthi and Siva- limitations while handling complex datasets.
kumar 2023; Hemanandhini and Vignesh Rajkumar 2023; The ARB region covers 53% of the crop land (Thiruna-
Velpuri et al. 2023). Few studies utilized integration of simu- vukkarasu and Ambujam 2020). The district-wise census
lation products for reducing errors and improving model per- handbook recorded during the year 2011 reveals that agri-
formance than the single precipitation product (Kolluru and cultural activities are the primary source of livelihood for the
Kolluru 2021; Kossieris et al. 2024; Seker and Gumus 2022). people in this region (https://​tn.​census.​gov.​in/​censu​s2011.​
In recent advancements, AIbased algorithms have php). The farmers usually cultivate the crop depending on
been applied in the bias corrected GCM simulation on Agro climatic season of India classified as Kharif and Rabi
regional scales and many studies have proven effective seasons. In both seasons, the crop yield depends on the mon-
results in GCM simulations through regression-based soon rainfall. The interannual and seasonal rainfall trends
machine learning algorithms like Decision-tree, Artificial of the study area have been fluctuating, resulting in both
neural network, Random-Forest etc. (Jebeile et al. 2021; positive and negative impacts on agriculture productivity
Jose et al. 2022; Niazkar et al. 2023; Nourani et al. 2019; (Kokilavani et al. 2017). According to the literature, there
Prathom and Champrasert 2023; Seker and Gumus 2022; is a significantly decreasing trend in the monsoon season
Sulaiman et al. 2022; Wang et al. 2023). Few research- and an increasing trend in the non-monsoon season in the
ers have improved the effectiveness of GCM simulation study area (Nagaraja 2021). One of the major crops like rice,
by using MME (Multi-Model Ensemble learning), which produced a yield of 3902 kg/ha in the year 2000, 4502 kg/
means the multiple machine learning techniques in each ha in 2010, and 3380 kg/ha in 2020 (https://​www.​data.​gov.​
case study with its future projection. Projection of tempera- in/​catal​og/​distr​ict-​wise-​season-​wise-​crop-​produ​ction-​stati​
ture and precipitation were done by using the Multi-model stics-0, by the District-wise Department of Economics and

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 3 of 28 129

Statistics, Department of Agriculture and Farmer Welfare) stics-0). The study area also faced many conflicts over
in the study area. The yield varied based on the rainfall vari- water resources and other activities due to season variabil-
ability. Through the outcomes of this study, the projected ity (Nagaraja 2021). The current study, aims to project the
monthly rainfall will be helpful in suggesting the sowing future monthly precipitation, will help to provide sugges-
period for crops. The study aims to project future precipita- tions to cope with the climate change impacts and improve
tion changes using the CMIP6 precipitation model in the agricultural productivity. The study area contains 21 rain
ARB region. The objectives are as follows: (1) To create gauge stations. The geographical coordinates of the station
an MME CMIP6 GCM based on performance metrics in points are mentioned in Fig. 1.
the study area. (2) To develop a SR algorithm with opti-
mized HPT for correcting the biases in the MME CMIP6
GCM and projecting the future precipitation during the years
2015–2100 under four SSP emission scenarios. The present 3 Data source description
work is to develop an SR (Aggregation of base models) algo-
rithm integrating Linear Regression (LR), Decision Tree 3.1 Observed data
(DT) Regression, Random Forest (RF) Regression, Support-
Vector Machine (SVM) Regression and Multi-layer percep- Region-wise observed daily data (1985–2020) of the
tron (MLP) Regression with optimized HPT for projection. study area have been acquired from the Tamil Nadu state
The highlights of the work, monthly projected precipita- groundwater and surface water data center in Tharamani,
tion with satisfied model performance evaluation was done Chennai. The region has 21 rain gauge stations which are
through the SR-based machine learning model. Regional- considered as base-line period for projecting future pre-
level fine-resolution precipitation projection will be highly cipitation changes with satisfied accuracy. The station-
helpful for planning various irrigation schemes and efficient wise observed rainfall data missing values were filled
water management depending on water availability. by using the mean imputation method. Mean Imputation
replaces the missing values with the arithmetic mean of
the available data (i.e. mean value of that day in all the
2 Study area years). Table 1 provides the geographical information
and annual rainfall magnitude of the study area’s rainfall
The ARB (semi-arid basin) located in the middle part of stations.
the Cauvery basin, Tamil Nadu, India. It covers an area
of about 8.3 × ­103 ­km2. It rises from the Western Ghats
in Idukki district, Kerala state. It flows in a north-east 3.2 CMIP6 data
direction and finally merges with the Cauvery River. The
study area primarily covers Karur and Dindigul districts, This study utilized 33 CMIP6 GCMs under the variant
as well as parts of Erode and Coimbatore districts. The label of r1i1p1f1 of the monthly precipitation data obtained
Amaravathi River is the longest tributary which passes from the ESGF website (https://​esgf-​node.​llnl.​gov/​search/​
through Udumalaipettai, Dharapuram, and Karur. Accord- cmip6/). The variant label (r1i1p1f1) components defined
ing to the Tamil Nadu Irrigated Agriculture Modernization as follows: ‘r’ stands for realization, the number 1 repre-
and Water-Bodies Restoration and Management (TNIAM- sents the initial condition of the ensemble member. Mul-
WARM) report, the study area receives 70% of its mean tiple ensemble members are running the same GCM
rainfall during the North-East monsoon and 20% during model name. The ‘i’ stands for initialization which means
the South-West monsoon, with an annual average rainfall experimental identifier used in the GCM simulation. The
of about 750–1200 mm. The maximum and minimum tem- ‘p’ stands for physics which means specific configuration
peratures range from 26 degrees to 18 degrees. The area of the model run. The ‘f’ stands for forcing which means
exhibits two distinct topographical features: hilly terrain greenhouse gas emission scenarios. The ESGF provides
and undulating plains. The hilly terrain has an elevation raw global level climate data for both historical and future
between 2300 m and 500 m and the undulating plains have scenarios for all climate variables. As of September 4th,
an elevation between 500 m and 40 m (https://​www.​tniam​ 2023, the available CMIP6 precipitation GCMs were down-
warmt​nau.​org/​sub-​basins/​area/​amara​vathi). Agriculture is loaded under the variant label r1i1p1f1. The main scope of
the primary livelihood for most of the population in the the current work is to improve the CMIP6 climate projec-
region. However, climate variability has led to fluctuations tions using four different SSP (SSP1-2.6, SSP2-4.5, SSP3-
in agricultural productivity, which affected the country’s 7.0 and SSP5-8.5) scenarios until the year 2100. Table 2
GDP (Gross Domestic Product) (https://​www.​data.​gov.​in/​ lists the model names, developed institutions, and resolu-
catal​og/​distr ​ict-​wise-​s eason-​w ise-​c rop-​p rodu​c tion-​stati​ tions of each individual GCM used in this study.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 4 of 28 H. Shanmugam, V. R. Lakshmanan

Fig. 1  Geographical location of the study area

Table 1  Geographical decimal coordinates for 21 observed rainfall 4 Methodology


stations
S.No Rainfall station Latitude Longitude Annual The overview of the methodology is presented in Fig. 2,
rainfall(mm) while the proposed Stacking Regression (SR)-based pro-
1 Adalur 10.36 77.74 1042
jection is illustrated in Fig. 3. The procedure for generating
2 Amaravathi Nagar 10.33 77.28 711
monthly precipitation projections is outlined as follows.
3 Anaipalayam 10.88 77.96 552
4 Aravakuruchi 10.77 77.91 520 4.1 Procedure for precipitation projections
5 Chatrapatti 10.47 77.65 703 through 2100
6 Dharapuram 10.73 77.53 633
• Initially, daily precipitation data from observed stations
7 Dindigul 10.37 77.99 905
8 K. Paramathi 10.96 77.91 654 for the period 1985–2014 were collected. Missing val-
9 Kamatchipuram 10.51 77.86 819 ues within the dataset were filled using an imputation
10 Kaniyur 10.6 77.38 701 method. Subsequently, the daily data were converted into
11 Kankeyam 11 77.56 607 monthly precipitation values.
12 Karur 10.96 78.08 654
• The CMIP6 GCMs for historical (1985–2014) and future
13 Kodaganur Dam 10.59 77.97 697 periods (2015–2100) were extracted based on the nearest
14 Moolanur 10.79 77.71 542 grid points. The grid points were identified using Inverse
15 Palani 10.31 77.43 717 Distance Weighting (IDW), a spatial interpolation tech-
16 Palladam 10.99 77.28 554 nique employed to estimate the values of a variable at
17 Rudhravathi 10.85 77.44 544 unsampled locations based on the values at the nearest
18 Udumalaipettai 10.57 77.24 671 sampled points. For the development of the ML model,
19 Uthamapalayam 10.9 77.67 555 the year 1985–2014 rainfall were considered as the base-
20 Vedasandur 10.54 77.97 714 line period.
21 Virupatchi 10.47 77.7 748
• A Multi-Model Ensemble (MME) of CMIP6 GCM was
constructed using five highly correlated GCMs. The per-

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 5 of 28 129

Table 2  List of GCMs with resolution and developed country


S. No Precipitation GCM Developed country Resolution S. No Precipitation GCM models Developed country Resolution (lati-
models name (latitude x name tude x longitude)
longitude)

1 ACCESS-CM2 Australia 1.875°x1.25° 18 GISS-E2-1-G USA 2.5°x2.5°


2 ACCESS-ESM1-5 Australia 1.875°x1.25° 19 GISS-E2-1-H USA 2.5°x2.5°
3 AWI-CM-1-1-MR Germany 1.125°x1.125° 20 GISS-E2-2-G USA 2.5°x2.5°
4 AWI-ESM-1-1-LR Germany 1.125°x0.875° 21 GISS-E2-2-H USA 2.5°x2.5°
5 BCC-CSM2-MR China 1.125°x1.125° 22 IITM-ESM India 1.88°X1.89°
6 BCC-ESM1 China 2.81°x2.81° 23 INM-CM4-8 Russia 2°x1.5°
7 CAMS-CSM1-0 China 1.125°x1.125° 24 IPSL-CM5A2-INCA France 2.5°x 1.26°
8 CAS-ESM2-0 China 1.4°x1.4° 25 IPSL-CM6A-LR-INCA France 2.5°x 1.28°
9 CESM2 USA 0.9°x1.3° 26 KACE-1-0-G South Korea 1.88°x1.88°
10 CESM2-WACCM USA 0.9°x1.3° 27 MIROC6 Japan 2.81°x 2.77°
11 CMCC-CM2-HR4 Italy 0.875°x1.275° 28 MPI-ESM-1–2-HAM Germany 1.875°x1.875°
12 CMCC-CM2-SR5 Italy 0.875°x1.275° 29 MPI-ESM1-2-HR Germany 0.9375°x0.9375°
13 CMCC-ESM2 Italy 0.875°x1.275° 30 MPI-ESM1-2-LR Germany 1.875°x1.875°
14 EC-EARTH3 Europe 0.7°x0.7° 31 MRI-ESM2-0 Japan 1.125°x1.125°
15 EC-EARTH3-VEG Europe 0.7°x0.7° 32 NESM3 China 1.9°x1.9°
16 FGOALS-F3-L China 2°x2.25° 33 TaiESM1 Taiwan 1.25°x0.94°
17 FIO-ESM-2-0 China 1.3°x1.3°

Fig. 2  Overview of methodology (pr indicates the precipitation)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 6 of 28 H. Shanmugam, V. R. Lakshmanan

Fig. 3  Proposed SR for precipitation projection

formance of each model and MME model relative to the • Development of policies and scenarios has facilitated
observed data was visualized through a Taylor diagram. the better integration of IAV (Impacts, Adaptation, and
• The baseline period precipitation data from the MME Vulnerability) research in future climate assessments.
CMIP6 GCMs were first corrected by comparing them
with the observed rainfall data. Simulated historical and 4.2 Evaluation of GCM ranking
future precipitation were corrected and projected using
various SSPs emission scenarios. Correction and pro- GCMs are widely used to simulate past climates and project
jection were done by using multiple machine learning future climate variables. However, uncertainty exists in the
algorithms with an optimized HPT. regional simulation of multiple GCMs due to differences in
• To improve the performance evaluation between their spatial resolution, response mechanisms (ocean, aero-
observed and simulated historical precipitation, Stacking sols, circulation of land and atmosphere) and temporal scales
Regression (SR) i.e., the integration of multiple machine (Jain et al. 2019). Hence, there is an urgent need to identify
learning models, was employed. Based on the SR fit, a suitable GCM to minimize the errors and uncertainties in
future precipitation projections for the period 2015–2100 regional projections. Researchers have implemented numer-
were generated under four different SSP emission sce- ous methodologies to identify the most suitable GCM for
narios: SSP1-2.6 (Sustainability), SSP2-4.5 (Middle of climate projection. A comprehensive review of articles rec-
the Road), SSP3-7.0 (Regional Rivalry) and SSP5-8.5 ommended metrics-based evaluations, such as correlation
(Fossil Fuel Development). coefficients, standard deviation and other errors (Raju and
Kumar 2020). In this study, the correlation coefficient was
Why SSP scenario is important in this study. employed to assess the performance of individual GCMs
and to develop a Multi-Model Ensemble (MME) of CMIP6
• According to the latest IPCC Sixth Assessment Report GCMs (Seker and Gumus 2022). This approach helps reduce
(2021), SSPs are used to derive the current carbon diox- biases and uncertainties in climate projections. Additionally,
ide and greenhouse gas emissions adopted by the latest a Taylor diagram was used to compare and visualize the
climate policies. They also serve to evaluate the recent performance of both CMIP6 precipitation GCMs and the
climate impacts and adaptation measures.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 7 of 28 129

MME GCM with respect to observed rainfall data across et al. 2023; Jebeile et al. 2021; Jose et al. 2022; Schoof and
the ARB region. Robeson 2016; Sutanudjaja et al. 2018). The EQM approach
relies on frequency distribution functions that relate the
4.2.1 Taylor diagram observed climate data to the simulated climate data. This
method is considered robust and consistent for correct-
The Taylor diagram (Taylor 2001) provides a way to visual- ing historical simulated GCM data and projecting future
ize the closeness between the base-line period observed and GCM data. Frequency distribution functions are typically
GCM precipitation values. In the Taylor diagram, three key established in three ways: (1) distribution-derived function
metrics were considered: the correlation coefficient, standard transformations, (2) parametric distribution transformations
deviation, and Root Mean Square Error (RMSE). The opti- and (3) non-parametric distribution transformations. Among
mal model was identified based on the performance metrics. these, non-parametric distribution transformations, which
A higher correlation coefficient indicates a stronger relation- are used to reduce the error and uncertainty in GCM climate
ship between the observed and simulated data, suggesting a data, which are particularly aligned with the empirical quan-
better model performance. The standard deviation reflects tile mapping technique. The EQM can be mathematically
the average degree of variation between the observed and expressed as an Eq. (1).
simulated values. A higher standard deviation indicates a [ ( )]
greater deviation from the ideal model, while a lower stand-
Xcorrected = F−1
0
Fm Xuncorrected (1)
ard deviation indicates a closer fit to the observed data. The where.
RMSE quantifies the magnitude of the error between the X uncorrected = Uncorrected GCM climate data in histori-
observed and predicted values, with a lower RMSE signify- cal period.
ing a model with higher accuracy. X corrected = Corrected GCM climate data in historical
period with respect to regional observed data.
4.3 Bias correction techniques F m= non-parametric transformation function.
F0 = Station-wise observed data.
Bias correction techniques are a process of adjusting the According to the EQM transformation, the non-par-
climate model output using a transformation function with ametric distribution derived from the transformation is
respect to the observed climate variables. The different bias used to project new data. The quantile based non-para-
correction techniques for precipitation were utilized in the metric transformation functions establish the relationship
study (Jaiswal et al. 2022) as follows: (1) Linear scaling between the observed rainfall data and the GCM-simu-
(SCL), (2) Local Intensity Scaling (LOCI) and (3) Empirical lated precipitation data. Each input variable in the training
Quantile Mapping (EQM). Among these, Scaling followed dataset undergoes the transformation fitting, which is then
by EQM performed the better performance in the study area. applied to predict both the new historical variables and
Similarly, the EQM method was utilized for bias correction future projections. These projections are generated with
of precipitation and temperature projections in Indian Sub- the assistance of multiple machine learning algorithms,
continent river basin (Mishra et al. 2020). Therefore, the cur- enhancing the accuracy and robustness of the projections.
rent study adopted the EQM method employing a different
machine learning algorithm.
4.3.2 Data preprocessing
4.3.1 EQM as machine learning algorithms
Data preprocessing is the process of cleaning the data and
EQM technique is widely used for bias-correction in climate making it’s a corrected format was important. In the sta-
science because it focuses on correcting the entire distribu- tion-wise observed data, outliers were detected in precipi-
tion of data by using fitting relations between the regional tation dataset. The following errors are considered outli-
observed rainfall and GCM precipitation datasets. It reduces ers: abnormal value of precipitation in the respective study
the biases and uncertainties compared to all other statistical area. The outliers were removed and replaced by using
based downscaling techniques. Bias-correction techniques imputation techniques. After that, the GCM data extracted
were developed for correcting the global simulated climate was initially unit conversion was done with respect to the
variables with respect to observed climate data. Further observed rainfall data and after that the simulated pre-
development of bias-correction techniques as statistical cipitation having high biases, its corrected by using loop-
downscaling, dynamical downscaling and ensemble multi- ing linear scaling method with respect to the station-wise
model machine learning techniques between the global observed rainfall data (Teutschbein and Seibert 2012).
climate model and regional climate model (Bhattacharjee

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 8 of 28 H. Shanmugam, V. R. Lakshmanan

4.3.3 Base machine learning algorithms node. The root node represents the specific target variable of
the dataset that follows the branch node which decides the
Multiple machine learning algorithms such as LR, DT regres- transformation and finally ends up with the outcomes of the
sion, RF regression, SVM regression and MLP regression target variables that are mentioned in leaf node or terminated
were used in this study. These machine learning based regres- node. It is used to predict the new set of observed values based
sion techniques are effective in reducing biases and uncer- on the training dataset fit. It reduces the underfitting and over-
tainties in both historical and future climate projections. All fitting errors in the new projection dataset.
algorithms were implemented by Google Colab directory
(i.e.) It is a kind of cloud based jupyter Notebook services 3. Random Forest regression
that allows users to execute code on a particular online plat-
form with free access to Google drive. The implementation RF regression was developed by Breiman 2001 (Guven
of these machine learning algorithms was facilitated by the 2023). A RF regression as a Meta regression that fits a num-
scikit-learn library. For each precipitation station, individual ber of decision-tree regressions on number of sub-samples
calculations were performed. The station-wise observed data with the following decision tree to predict the new projec-
and GCM simulated data for precipitation were divided into tion variable. The new set of projection variables developed
a training period (75%) and a testing period (25%). Using the based on non-parametric transformation fit relation between
training fit, evaluating the model performance of the indi- the independent variables as GCM precipitation data and
vidual machine learning models and SR-based algorithms. the dependent variable as station wise observed variables.
An overview of the individual machine learning algorithms After merging the decision trees, the Random forest is cre-
used in this study is provided below: ated (Jose et al. 2022). The output of the RF prediction is
estimated by voting the prediction of all the decision trees.
1. Linear Regression
4. Support Vector Machine regression
LR is like simple linear regression analysis and it was
used in this climate projection study. The simple linear SVM (Hernanz et al. 2022) was classified under the
regression analysis establishes the relation between the one supervised machine learning algorithms designed for clas-
dependent variable and one independent variable by fitting sification and regression purposes. It is defined as the combi-
the linear relation as mentioned in the Eq. 2. nation of minimal errors and maximizing the distance from
the data points to the marginal boundary. The goal of an
Y = β1 X + ε (2) n-dimensional space hyper plane search for a support vector
where. machine technique is to identify a separate classifier for each
Y = Dependent variable i.e. Station-wise observed data point. Support vectors are the data points that are clos-
rainfall. est to the hyper plane on each side which help to build the
X = Independent variable i.e. GCM simulated precipita- support vector machine fit. Using this non-linear transforma-
tion variable. tion fit, SVM estimate the new observed value depending on
β1= linear relation by regular non-parametric distribution the variables of GCM simulated value (Jose et al. 2022). In
transformation. support vector regression, the best fit line is the hyper plane
ε = Error value. that has maximum number of data points. The hyper plane
This kind of linear relation has been used for climate equation as mentioned in Eq. (3).
downscaling projection with its impact analysis and hydro- y = ω1 φ (x) + ω0 (3)
logical applications (Gebrechorkos et al. 2019); Gurara et al.
2022). ϕ = non-parametric transformation.
x = simulated climate data (GCM data).
2. Decision Tree regression y = station-wise observed data.
ω1 and ω0 are optimum hyper plane which is defined by
DT regression is a non-parametric supervised learning its parameters.
method used for regression analysis in this study. The main
goal is to create a cumulative distribution fit that predicts the 5. Multi-Layer Perceptron regression
new set of projected precipitation values by learning the simple
decision rules from the input data features as simulated GCM ANN (Nourani et al. 2019) is a supervised learning algo-
value (Xiang et al. 2020). DT represents a flowchart-based rithm based on the neuron function behavior followed by
structure that includes a root-node, leaf-node and decision nodes named as a perceptron. In which the implementation

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 9 of 28 129

of neural networks called MLP regression. The nodes of – min_samples_leaf: The minimum number of samples
MLP are separated as the input layers, number of hidden lay- required at a leaf node,
ers, output layers that were connected to the adjacent layers. – max_features: The maximum number of features
The equation of ANN algorithm as mentioned in Eq. (4). considered at each split.
∑ • Random Forest Regression, which aggregates multiple
(4)
m
z= wx +b
j=1 j j
decision trees, shares similar hyperparameters but also
Generally. includes additional parameters like:
xj=Input signals get into each node from the (m) node. n_estimators: The number of trees in the forest.
Wj = weightage parameter. • SVM Regression has hyperparameters such as:
z = Result from the input function that fed into the activa-
tion function. C: The regularization parameter,
b= bias. – kernel: The type of kernel used to transform the input
MLPs (Reddy et al. 2024) were trained with the help of data,
back propagation algorithm which calculate the gradient – gamma: The parameter that controls the influence of
loss function with the respect to the updates the parameters individual training samples,
iteratively to reduce the error. In a MLP regression, precipi- – degree: The degree of the polynomial kernel func-
tation variables get into the neurons. The neuron performs tion.
to predict a new set of predicted precipitation values which • MLP Regression includes hyperparameters such as:
involves in weightage average parameter as a non-linear
transformation function into the hidden layers. The weight- hidden_layer_sizes: The size of the hidden layers,
age value was determined to improve the output through the – activation: The activation function used in the hidden
activation function. In this study, MLP regression is used layers,
to predict the new set of projection values with the help – alpha: The L2 regularization parameter,
of weightage average parameter with respect to the GCM – batch size: The number of samples per gradient
simulated precipitation value. update.

4.3.4 HPT with optimization techniques The selected hyperparameters in this study include max_depth,
min_samples_split, and min_samples_leaf for DT; n_estima-
In machine learning, the prediction of target variables depends tors, max_depth, min_samples_split, and min_samples_leaf
on both parameters and hyperparameters. Parameters are vari- for RF; C, kernel, and degree for SVM; and hidden_layer_
ables used by learning algorithms to predict the target variable sizes, activation, and alpha for MLP regression. HPT is essen-
based on independent variables, the process typically being tial for selecting the optimal hyperparameters which reduce
handled by the algorithm itself. In contrast, hyperparameters underfitting or overfitting. While manual search methods for
are predefined by the user and specify certain characteris- hyperparameter optimization can be labor-intensive and often
tics of the learning model that influence its performance, ineffective, automated techniques have been developed to
particularly regarding prediction accuracy. Hyperparameters identify the most effective hyperparameters. Automated opti-
vary across different machine learning algorithms and play a mization methods include Random Search, Grid Search, and
critical role in improving the ability of the model to enhance Bayesian Optimization. According to (Belete and Huchaiah
its performance. This study examined five machine learning 2022; Elgeldawi et al. 2021), the GridSearchCV technique is
algorithms: LR, DT, RF, SVM and MLP Regression. Each widely used for hyperparameter optimization. GridSearchCV
algorithm has distinct hyperparameters designed to optimize systematically evaluates combinations of all the hyperparam-
the performance of target variable predictions. eters through cross-validation for determining the optimal set
for a given model. This method is crucial for improving the
• Linear Regression does not have any hyperparameters, model accuracy and the workflow for GridSearchCV optimi-
as it is a relatively simple algorithm. zation is shown in Fig. 4.
• Decision Tree Regression involves hyperparameters such GridSearchCV working process is outlined in following steps.
as: STEP 1: Assigning the hyperparameter value - This is the
initial step of the optimization process, where the user speci-
– max_depth: The maximum depth of the tree, fies the respective values for the hyperparameters in a grid.
– min_samples_split: The minimum number of sam- This grid contains all the possible combinations of hyper-
ples required to split a node, parameters that the model will explore during optimization.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 10 of 28 H. Shanmugam, V. R. Lakshmanan

STEP 2: Extensive Search - GridSearchCV performs an 4.4 Performance evaluation


exhaustive search across all possible combinations of hyper-
parameters. For example, if a Decision Tree model has three The performance of the regression-based machine learn-
hyperparameters, GridSearchCV will evaluate 27 (3 × 3 × 3) ing models was evaluated by comparing the predicted tar-
different combinations of those hyperparameters. This exhaus- get variables with the actual observed values. Three key
tive search ensures that all possible iterations are tested. performance metrics were used to assess model accuracy:
STEP 3: Cross-Validation - For each iteration combina- R² (Coefficient of Determination), Mean Absolute Error
tion of hyperparameters to check the k-fold cross-validation (MAE), and Root Mean Square Error (RMSE).
to evaluate the model. In the k-fold technique, the dataset is 1. ­R2 (Co-efficient of Determination).
split into a number of k-sub samples. The cross validation R2 (Co-efficient of Determination) represented how well
trained with (k-1) and validated with the remaining k sam- the regression model fits between the actual observed val-
ples. The iteration is continued with k-1 and k samples for ues and predicted observed rainfall values (Jose et al. 2022;
validating the best model performance (Elkiran et al. 2021; Prathom and Champrasert 2023) in the dataset mentioned in
Okkan and Kirdemir 2016; Sulaiman et al. 2022). In this Eq. (5). A low R² value indicates that the model’s predicted
study, k-fold random splitting of the dataset is done with values are far from the actual observed values, while a high
5 subsamples considering as k = 5 (five-fold subsamples). R² value indicates a better fit.

∑ n
(Actual observed value − predicted observed value)
2
R =1− ∑ n
i=1
(5)
i=1
(Actual observed value − mean value of actual observe value)

2. MAE (Mean Absolute Error). prediction errors (Mesgari et al. 2022; Munawar et al. 2022)
MAE (Mean Absolute Error) is defined as the average mentioned in Eq. (6). If the MAE values equal zero, this indi-
absolute difference between the actual and predicted observed cates no difference between the actual observed values and
rainfall values, providing insight into the magnitude of predicted observed values, implying perfect model accuracy.

1∑ n | (6)
MAE = i=1 |
Actual observed valuei − Predicted observed valuei ||
n

3. RMSE (Root Mean Square Error). et al. 2022). MAE will measure the error of the model mag-
RMSE is quite similar to the MAE and it also measure the nitude, but RMSE gives more weight to large errors, making
error between predicted observed value and actual observed it detect outliers or extreme values. The formula for RMSE
rainfall value (Prathom and Champrasert 2023; Rhymee is mentioned in Eq. (7).


1∑ n (7)
RMSE = i=1
Actual observed valuei − Predicted observed valuei
n

4.5 Stacking regression resulting in better predictive performance compared to indi-


vidual machine learning models. By leveraging the diversity
To enhance the performance of the model, aggregation- of the base models, SR reduces the likelihood of overfit-
based Stacking Regression (SR) was employed in the pre- ting or underfitting, leading to more accurate and reliable
sent study. In machine learning, SR refers the technique of projections.
combining multiple base models to improve predictive accu-
racy. It is a specific ensemble method that aggregates the
predictions of base models using a meta-learner to generate 4.5.1 Steps involved in SR
a more optimized and robust final prediction (Lu et al. 2023;
Shahhosseini et al. 2022). This aggregation is achieved using The key Steps involved in the SR are outlined as follows:
model weightage averaging and bias-variance tradeoff to
reduce the bias and variance errors. The working principle 1. Data preparation: The first step involves acquiring both
of SR is shown in Fig. 5. The advantage of the SR is the the features (simulated data) and the target variable
ability to integrate the strengths of multiple base models, (observed data). This stage includes:

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 11 of 28 129

3. Training base models.

Once the base models are selected, each model is trained


using the training dataset. During this stage, HPT is per-
formed to identify the optimal settings for each base model.
Hyperparameters are fine-tuned to ensure the best possible
performance for each model.

4. Prediction and performance Assessment.

After training, each base model generates predictions based


on the performance. These predictions are evaluated using
performance metrics such as R², MAE, and RMSE to assess
the accuracy of each model. The base models’ performance
is carefully monitored to ensure the quality of predictions.

5. Framing a Meta Model.

At this stage, a meta-model (or meta-learner) is created


to combine the predictions of the base models and gener-
Fig. 4  Workflow for grid search cross validation optimization ate the final output. The meta-model takes the predictions
from the base models as input and refines them to produce a
more accurate prediction. The meta-model can be built using
• Splitting the dataset into training and testing sets. various methods, such as: Neural networks, Logistic regres-
• Eliminating outliers from the dataset to ensure data sion and Linear regression. In this study, a Linear regression
quality. approach is used to aggregate the predictions.
• Minimizing redundancy to reduce overfitting and
improve model efficiency. 6. Training the Meta Model.

2. Choosing the base models The meta-model is trained using the predictions from the
base models as input features. The meta-regression model
The next step is selecting the base models that will be learns to map these base model predictions to the actual target
used in the stacking ensemble. In this study, the follow- variable, refining the predictions to minimize errors and bias.
ing machine learning models were selected as base models
for the Stacking Regression: LR, DT, RF, SVM and MLP 7. Generating new predictions and model evaluation.
regression.
Once the meta-model is trained, it generates the final
predictions on the entire dataset. The final projection is

Fig. 5  Working principle of


the SR

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 12 of 28 H. Shanmugam, V. R. Lakshmanan

produced based on the aggregations from the base models. The relative position of the Multi-Model Ensemble (MME)
The SR performance is evaluated using the same perfor- GCM indicates higher correlation, lower root mean square
mance metrics (e.g., R², MAE, RMSE) as used for the base error (RMSE) and standard deviation values compared to
models. This step helps to ensure that the ensemble model individual CMIP6 GCMs when evaluated with respect to
effectively improves predictive accuracy compared to indi- the observed data. Taylor diagrams for the remaining sta-
vidual base models.This SR fit helped to project the monthly tions are provided in Figs. 7, 8, 9, 10 and 11. Across all
precipitation upto 2100. stations, the relative position of the MME GCM (represent
as violet color) showed high correlation coefficient and low
error than the individual GCMs. MME GCM performed bet-
4.6 Projected precipitation change over the entire
ter than the individual GCMs performance metric. Based
ARB
on the interpretation of all the Taylor diagrams, the MME
GCM was selected as the preferred simulated GCM over the
Based on projected historical and future precipitation data
entire ARB region.
were obtained through the SR based bias correction methods.
The projections were evaluated under four different emis-
5.2 Assigning the hyperparameter value
sion scenarios across the ARB. The Spatio-temporal varia-
for projecting the precipitation
tions of future projected annual precipitation were analyzed
into three periods: 2025–2050(2040), 2051–2075(2065)
The station-wise observed rainfall data were prepared along
and 2076–2100(2090) which were named as Near-Future
with interpolated CMIP6 precipitation data. Machine learn-
year, Mid-Future year and Far-Future year. The percentage
ing algorithms were applied, considering the non-paramet-
change in precipitation during these periods were calculated
ric relationships between the observed and simulated vari-
with relative to the baseline period (1985–2014). The for-
ables. Various regression models, including LR, DT, RF,
mula for percentage change is given as Eq. (8).
SVM and MLP were employed to correct and project both
Pnew − Pobserved the historical and future MME CMIP6 projections with
Percentage change = X 100 (8)
Pobserved respect to observed rainfall data. The performance of the
regression-based models was enhanced through hyperpa-
Where ­Pnew = projected annual precipitation under different
rameter tuning. Notably, Linear Regression does not have
scenarios.
no hyperparameter.
Pobserved = Base-line period average annual precipitation.
For DT, the following hyperparameters were selected:
‘max_depth’ [None, 5, 10, 15] (where None indicates unlim-
ited depth or a range from 1 to 32), ‘min_samples_split’
5 Results [2, 5, 10] (representing the minimum number of samples
required to split a node, with a range from 2 to 20), and
5.1 Evaluation of GCMs ‘min_samples_leaf’ [1, 2, 4] (denoting the minimum num-
ber of samples required at a leaf node, with a range from
Initially, the monthly observed rainfall data were compared 1 to 20). These hyperparameters were chosen to optimize
with the 33 simulated GCMs presented in Table 3, using the model performance. In RF, which is a combination of mul-
correlation coefficient. Individual GCMs or combinations tiple decision trees, the hyperparameter ‘n_estimators’ (the
of multiple GCMs can help to reduce the bias and reduce number of trees in the forest) varied between 100 and 1000.
uncertainty in projections. Although there is no established Additionally, ‘max_depth’ [None, 1 to 32], ‘min_sam-
procedure for selecting the number of GCMs for develop ples_split’ [2, 5, 10], and ‘min_samples_leaf’ [1, 2, 4] were
a Multi-Model Ensemble (MME), it is common practice to selected based on their influence on model performance.
select between 3 and 10 of the highest-performing models For SVM, some of the hyperparameters included: the
(Seker and Gumus 2022). The five highest-performing GCMs regularization parameter ‘C’ [0.1, 1, 10] used to control the
identified were AWI-CM-1-1-MR, CAMS-CSM1-0, CAS- trade-off between margin maximization and classification
ESM2-0, IITM-ESM, and NESM3. A Multi-Model Ensem- error minimization, ‘kernel’ parameter [linear, poly, rbf,
ble (MME) was constructed by averaging the outputs of these sigmoid] is used to define the transformation function into
five GCMs across all observed stations. Subsequently, a Tay- higher-dimensional spaces. The ‘degree’ parameter range of
lor diagram was employed to visualize the performance of [2, 5] for polynomial kernels. For the MLP, the few hyperpa-
the observed rainfall data in comparison to the 33 CMIP6- rameters as selected : the size of the hidden layers, such as
simulated values and to assess the performance of the MME. [(50, 50), (100,), (100, 50, 25)], which defines the number of
Figure 6 presents the Taylor diagram between GCMs neurons in each layer and the activation function(activation)
precipitation and observed rainfall at the Adalur station. like ReLU (Rectified Linear Unit), Tanh (Hyperbolic

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Table 3  Correlation coefficient between observed and GCM rainfall
CMIP6 Precipitation GCM Observed Rainfall Stations
Adalur Amaravathi Nagar Anaipalayam Aravakurichi Chatrapatti Dharapuram Dindigul K. Paramathi Kamatchipuram Kaniyur Kankeyam

ACCESS-CM2 −0.10 −0.17 −0.18 −0.18 −0.20 −0.16 −0.13 −0.16 −0.14 −0.20 −0.11
ACCESS-ESM1-5 0.20 0.14 0.10 0.11 0.18 0.11 0.22 0.15 0.20 0.18 0.16
AWI-CM-1-1-MR 0.37 0.31 0.35 0.32 0.36 0.32 0.40 0.45 0.41 0.36 0.39
AWI-ESM-1-1-LR −0.02 −0.06 0.03 0.05 −0.02 −0.01 −0.04 0.00 −0.02 −0.03 −0.04
BCC-CSM2-MR 0.21 0.13 0.22 0.22 0.18 0.18 0.21 0.23 0.18 0.17 0.23
BCC-ESM1 0.26 0.22 0.21 0.23 0.21 0.25 0.27 0.24 0.27 0.21 0.27
CAMS-CSM1-0 0.40 0.44 0.35 0.36 0.40 0.36 0.42 0.39 0.41 0.40 0.43
CAS-ESM2-0 0.31 0.27 0.26 0.27 0.30 0.22 0.35 0.28 0.30 0.29 0.31
CESM2 0.15 0.11 0.15 0.16 0.13 0.08 0.19 0.15 0.14 0.13 0.14
CESM2-WACCM 0.16 0.09 0.14 0.14 0.12 0.08 0.18 0.14 0.13 0.11 0.17
CMCC-CM2-HR4 0.25 0.16 0.23 0.22 0.22 0.23 0.28 0.25 0.25 0.22 0.28
CMCC-CM2-SR5 0.22 0.12 0.24 0.21 0.17 0.18 0.22 0.27 0.24 0.17 0.28
CMCC-ESM2 0.20 0.07 0.16 0.12 0.14 0.16 0.15 0.20 0.19 0.14 0.20
EC-EARTH3 0.29 0.07 0.25 0.24 0.10 0.24 0.12 0.17 0.14 0.10 0.16
EC-EARTH3-VEG 0.16 0.12 0.19 0.17 0.15 0.14 0.14 0.15 0.12 0.16 0.16
FGOALS-F3-L 0.13 0.14 0.16 0.15 0.13 0.17 0.15 0.17 0.17 0.14 0.24
FIO-ESM-2-0 0.20 0.06 0.18 0.15 0.12 0.10 0.20 0.20 0.20 0.12 0.20
GISS-E2-1-G 0.20 0.24 0.21 0.23 0.23 0.18 0.21 0.22 0.17 0.23 0.21
GISS-E2-1-H 0.01 −0.03 −0.01 −0.02 0.03 −0.03 −0.02 −0.05 −0.01 0.03 −0.04
Development of stacking algorithm for bias‑correcting the precipitation projections using…

GISS-E2-2-G 0.08 −0.02 0.06 0.02 0.02 0.02 0.10 0.09 0.08 0.02 0.07
GISS-E2-2-H −0.02 −0.06 0.03 0.05 −0.02 −0.01 −0.04 0.00 −0.02 −0.03 −0.04
IITM-ESM 0.28 0.30 0.29 0.30 0.29 0.23 0.26 0.28 0.25 0.29 0.31
INM-CM4-8 −0.10 −0.02 −0.11 −0.11 0.06 −0.94 −0.11 −0.14 −0.11 −0.06 −0.14
IPSL-CM5A2-INCA 0.25 0.09 0.22 0.21 0.18 0.18 0.24 0.31 0.26 0.18 0.28
IPSL-CM6A-LR-INCA 0.32 0.21 0.27 0.24 0.26 0.24 0.24 0.21 0.23 0.25 0.26
KACE-1-0-G −0.01 −0.05 0.01 −0.03 −0.04 −0.02 −0.07 −0.03 −0.02 −0.04 −0.02
MIROC6 −0.03 −0.13 −0.03 −0.06 −0.09 −0.06 −0.03 −0.02 −0.04 −0.09 0.00
MPI-ESM-1–2-HAM 0.06 0.10 0.12 0.11 0.04 0.04 0.04 0.03 −0.01 0.04 0.02
MPI-ESM1-2-HR 0.23 0.24 0.25 0.26 0.24 0.22 0.27 0.25 0.22 0.24 0.25

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


MPI-ESM1-2-LR 0.14 0.15 0.10 0.14 0.13 0.08 0.09 0.05 0.07 0.13 0.09
MRI-ESM2-0 −0.06 0.04 −0.07 −0.06 −0.03 −0.04 −0.05 −0.10 −0.05 −0.03 −0.08
NESM3 0.38 0.35 0.36 0.36 0.34 0.32 0.40 0.37 0.35 0.33 0.44
TaiESM1 −0.30 −0.35 −0.28 −0.32 −0.36 −0.27 −0.34 −0.32 −0.33 −0.36 −0.31
Page 13 of 28 129
Table 3  (continued)
129

CMIP6 Precipitation GCM Observed Rainfall Stations


Adalur Amaravathi Nagar Anaipalayam Aravakurichi Chatrapatti Dharapuram Dindigul K. Paramathi Kamatchipuram Kaniyur Kankeyam
CMIP6 Precipitation GCM Observed Rainfall Stations
Page 14 of 28

Karur Kodaganur dam Moolanur Palani Palladam Rudhravathi Udumalaipettai Uthamapalayam Vedasandur Virupatchi
ACCESS-CM2 −0.11 −0.22 −0.13 −0.14 −0.14 −0.16 −0.12 −0.14 −0.16 −0.17
ACCESS-ESM1-5 0.13 0.17 0.14 0.10 0.13 0.12 0.12 0.15 0.18 0.20
AWI-CM-1-1-MR 0.37 0.39 0.34 0.31 0.41 0.34 0.34 0.41 0.45 0.42
AWI-ESM-1-1-LR 0.01 0.00 −0.02 −0.04 −0.09 −0.05 −0.07 −0.05 −0.05 −0.03
BCC-CSM2-MR 0.20 0.16 0.17 0.10 0.16 0.16 0.13 0.24 0.20 0.22
BCC-ESM1 0.28 0.21 0.27 0.22 0.23 0.24 0.21 0.28 0.25 0.26
CAMS-CSM1-0 0.31 0.39 0.38 0.38 0.40 0.37 0.40 0.42 0.40 0.44
CAS-ESM2-0 0.32 0.31 0.24 0.25 0.23 0.22 0.24 0.29 0.31 0.31
CESM2 0.20 0.11 0.10 0.10 0.10 0.06 0.10 0.14 0.14 0.13
CESM2-WACCM 0.24 0.16 0.09 0.05 0.08 0.06 0.10 0.16 0.15 0.17
CMCC-CM2-HR4 0.25 0.25 0.19 0.13 0.22 0.16 0.20 0.26 0.24 0.27
CMCC-CM2-SR5 0.27 0.27 0.23 0.11 0.23 0.19 0.18 0.26 0.29 0.21
CMCC-ESM2 0.13 0.16 0.13 0.08 0.19 0.14 0.07 0.18 0.18 0.15
EC-EARTH3 0.28 0.14 0.20 0.12 0.12 0.18 0.08 0.08 0.10 0.14
EC-EARTH3-VEG 0.14 0.14 0.21 0.13 0.17 0.18 0.16 0.15 0.17 0.16
FGOALS-F3-L 0.14 0.15 0.18 0.13 0.15 0.17 0.16 0.18 0.13 0.14
FIO-ESM-2-0 0.22 0.17 0.12 0.05 0.13 0.12 0.01 0.18 0.19 0.20
GISS-E2-1-G 0.18 0.23 0.22 0.27 0.19 0.21 0.22 0.21 0.19 0.18
GISS-E2-1-H −0.03 0.01 −0.04 −0.04 0.00 −0.05 −0.01 −0.03 −0.05 0.01
GISS-E2-2-G 0.17 0.06 0.06 0.01 0.00 0.03 0.00 0.08 0.08 0.07
GISS-E2-2-H 0.01 0.00 −0.02 −0.04 −0.09 −0.05 −0.07 −0.05 −0.05 −0.03
IITM-ESM 0.26 0.27 0.26 0.27 0.23 0.24 0.27 0.29 0.30 0.27
INM-CM4-8 −0.15 −0.11 −0.06 −0.14 −0.15 −0.11 −0.06 −0.04 −0.08 0.07
IPSL-CM5A2-INCA 0.30 0.27 0.23 0.13 0.21 0.20 0.15 0.29 0.25 0.24
IPSL-CM6A-LR-INCA 0.29 0.26 0.21 0.17 0.24 0.27 0.25 0.27 0.26 0.30
KACE-1-0-G −0.02 −0.03 −0.01 −0.05 −0.03 −0.02 −0.03 −0.03 −0.05 0.03
MIROC6 0.04 −0.04 −0.08 −0.12 −0.06 −0.06 −0.08 −0.02 −0.06 −0.03

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


MPI-ESM-1–2-HAM 0.07 0.03 0.08 0.05 0.02 0.06 0.03 0.04 0.04 0.01
MPI-ESM1-2-HR 0.25 0.23 0.17 0.21 0.25 0.21 0.19 0.27 0.20 0.22
MPI-ESM1-2-LR 0.09 0.07 0.10 0.12 0.06 0.11 0.12 0.13 0.05 0.06
MRI-ESM2-0 −0.08 −0.03 −0.05 −0.01 −0.07 −0.04 0.00 −0.08 −0.07 −0.20
NESM3 0.44 0.35 0.36 0.32 0.36 0.36 0.33 0.41 0.37 0.39
TaiESM1 −0.31 −0.35 −0.32 −0.33 −0.36 −0.31 −0.31 −0.34 −0.33 −0.36
H. Shanmugam, V. R. Lakshmanan
Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 15 of 28 129

Fig. 6  Taylor diagram for Adalur station

Fig. 7  Taylor diagram for a) Amaravathi Nagar b)Anaipalayam c)Aravakurichi & d)Chatrapatti

Tangent), and Identity (no activation for linear layers). The The hyperparameters were selected through exten-
‘alpha’ parameter defined as the controls the strength of L2 sive trial and error, as some are highly sensitive to model
regularization, and it varied popularly as [0.0001, 0.001, performance and prediction accuracy. GridSearchCV
0.01]. Generally alpha parameters range from [1e-5 to 1e-1]. was employed to optimize hyperparameter selection for

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 16 of 28 H. Shanmugam, V. R. Lakshmanan

Fig. 8  Taylor diagram for a) Dharapuram b) Dindigul c) K.Paramathi & d) Kamatchipuram

Fig. 9  Taylor diagram for a) Kaniyur b) Kankeyam c) Karur & d) Kodaganar Dam

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 17 of 28 129

Fig. 10  Taylor diagram for a) Moolanur b)Palani c)Palladam & d) Rudhravathi

Fig. 11  Taylor diagram for a) Udumalaipettai b)Uthamapalayam c) Vedasandur & d) Virupatchi

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 18 of 28 H. Shanmugam, V. R. Lakshmanan

precipitation projections by iterating all possible combina- beneficial to consider all performance evaluation metrics
tions. The optimized hyperparameters was identified through (Meharie et al. 2022). Figure 12 compares the model perfor-
k-fold (5-fold) cross-validation (k = 5), which provided the mance before and after bias correction. From the respective
highest accuracy for each machine learning algorithm. A figure, the Taylor diagram indicated an Correlation co-effi-
summary of the hyperparameters used in this study, as well cient of approximately 0.5, RMSE of approximately 60 and
as optimum hyperparameters for all stations in the ARB Standard deviation of 45. After bias correction, performance
region, is presented in Table 4 (column 3). improved significantly with Correlation co-efficient increasing
to approximately 0.9, RMSE decreasing to 30 and standard
5.3 Performance evaluation deviation reducing to 30 across all the stations in ARB region.

The model performance evaluation between the historical 5.4 Future projection: spatio temporal variation
observed and SR based bias-corrected simulated rainfall is
presented in Table 5, comparing the performance of indi- Figure 13 presents the projected precipitation and its vari-
vidual ML regression techniques and the Stacking Regres- ation from the years 2015 to 2100 under different SSP sce-
sion (SR) technique. In the Adalur station, the DT and RF narios. The projected precipitation over the upcoming decades
models performed better than the LR, SVM and MLP mod- exhibits a fluctuating trend. Under the SSP 1–2.6 scenario,
els. The primary objective of the SR technique is to enhance projected precipitation is expected to fluctuate from the range
model performance by model averaging and improve the −6.95% to 63.35% in the near future years. Similarly, the
bias-variance tradeoff. In the Adalur station, stacking regres- precipitation variation is projected to range from 0.81% to
sion selected the model performance of RF and reduced bias- 67.33% in the mid-future years. The change in precipitation is
related errors by incorporating the second-weighted model. expected to range from 10.97% to 45.36% in far-future years.
Similarly, at the stations of Amaravathi Nagar, Anaipalayam, The SSP 2–4.5 scenario indicates that precipitation is
Aravakuruchi, Chatrapatti, Dharapuram, Dindigul, K. Param- projected to vary between 1.45% and 72.13% in the near
athi, Kamatchipuram, Kaniyur, Kodaganur Dam, Moolanur, future years. In the mid-future years, precipitation variation
Palani, Palladam, Vedasandur, and Virupatchi, the DT and RF is expected to range from 0.78% to 54.08%, while in the far-
models performed well in individual machine learning model. future years, precipitation change is projected to range from
In these above observed stations, SR was selected higher − 4.51% to 70.48%. The SSP 3–7.0 scenario predicted that
weightage DT model performance and reducing variance- precipitation would fluctuate between a decrease of 1.90%
related errors till the second-weightage model performance. In and an increase of 53.99% in the near future years. In the mid-
contrast, Kankeyam, Karur, Rudhravathi, Udumalaipettai and future years, the variation is projected to range from − 1.38%
Uthamapalayam stations had DT and RF models performed to 60.84%. In the far future years, precipitation change is
better than the individual machine learning models. The SR expected to range from − 4.51% to 70.48%. The SSP 5–8.5
technique considered higher weightage Random Forest model scenario forecasts that precipitation in the near future will
performance and reduced bias-related errors till the Decision vary between − 0.71% and 65.74%. Similar variations are
tree model performance. The SR technique combined multiple projected for the mid-future years, with precipitation ranging
base models, including LR, DT, RF, SVM and MLP regres- from 5.11% to 56.39%, and changes in precipitation expected
sion, with the aim of improving precipitation projections. SR to range from 11.43% to 60.66% by the end of the century.
reduced both bias and variance error by improving the inter- A box plot is used to visualize the distribution of pro-
pretability of precipitation projection. While the coefficient of jected precipitation changes considering the statistics as
determination (R²) and Mean Absolute Error (MAE) showed mean and percentiles as presented in Fig. 14. The projec-
improvements due to reduced bias, the Root Mean Square tion results suggest that the mean annual rainfall is expected
Error (RMSE) remained sensitive to variance-related errors. to increase under all SSP scenarios relative to the baseline
The model performance of the stacking regression showed R² period. In the near future, the SSP 2–4.5 scenario predicts
values ranging from 0.60 to 0.82, MAE values ranging from a higher magnitude of rainfall compared to the baseline
11.70 to 23.19 and RMSE values between 37.14 and 66.28. period. In the mid-future years, the magnitude of rainfall
Overall, the performance of the SR model slightly improved distribution remains consistent across all SSP scenarios,
compared to the individual DT and RF models. The stack- except for SSP 3–7.0. In the far future, the SSP 2–4.5 sce-
ing helps to minimize both bias and variance errors. These nario shows the highest magnitude of rainfall compared to
results indicate that the stacking algorithm enhanced model the baseline period.
performance, minimized errors, and increased the robustness The Spatio-temporal variations of projected precipitation
of precipitation projections. Commonly, when improving any changes in annual rainfall (sum of the projected monthly
simulation models, the R ­ 2 value is often prioritized. How- rainfall in particular period) were analyzed over the study
2
ever, this R
­ does not lead to the better results, It is more area during the three different periods: (a) Near-future year

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 19 of 28 129

Table 4  Base machine learning models and their hyperparameters


S. No Model Hyperparameter with search space Best parameter for precipi-
tation in all the stations

1 Linear Regression Regular parameterization −


2 Decision Tree Regression ‘max_depth’: [None, 5, 10, 15] ‘max_depth’: None,
‘min_sampes_split’: [2, 5, 10] ‘min_sampes_split’: 2
‘min_samples_leaf’: [1, 2, 4] ‘min_samples_leaf’:1,
3 Random Forest Regression ‘n_estimators’: [100, 200, 500] ‘n_estimators’: 100
‘max_depth’: [None, 5, 10, 15] ‘max_depth’: None,
‘min_sampes_split’: [2, 5, 10], ‘min_sampes_split’:2,
‘min_samples_leaf’: [1, 2, 4] ‘min_samples_leaf’: 1,
4 Support Vector machine Regression ‘C’: [0.1, 1, 10], ‘C’: 0.1,
‘kernel’: [‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’] ‘kernel’: ‘linear’
‘degree’: [2, 3, 4] ‘degree’: 2
5 Neural network Regression ‘hidden_layer_sizes’: [(100,), (50, 50), (100, 50, 25)], ‘hidden_layer_sizes’: (100)
‘activation’: [‘relu’, ‘logistic’, ‘tanh’], ‘activation’: ‘relu’,
‘alpha’: [0.0001, 0.001, 0.01]} ‘alpha’: 0.0001

(2040) (b) Mid-future year (2065) and (c) Far-future year Under the SSP1-2.6 emission scenario, the projected
(2090) with relative to the base line period average annual annual precipitation percentage changes for the mid-future
precipitation (1985–2014) under the four various SSP emis- year (2065) range from 20.52% to 171.24% relative to the
sion scenarios SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 baseline annual average precipitation across all the sta-
as shown in Figs. 15, 16 and 17. Overall, the north-west- tions in the study area. Similarly, under the SSP2-4.5,
ern parts of the regions experience low to medium annual SSP3-7.0, and SSP5-8.5 emission scenarios, the annual
precipitation (550 mm to 1050 mm) and the southern part precipitation percentage changes vary from − 27.69% to
of the region consistently receives higher annual rainfall 142%, −23.19% to 147.71%, and − 28.18% and 124.75%
(> 1250 mm) across all four emission scenarios. respectively with respect to the baseline annual average
In the near-future year (2040), station-wise projected precipitation. The Spatio-temporal variation of projected
annual precipitation percentage changes range from annual precipitation changes in the year 2065 is shown in
− 23.48% to 104.8% relative to the base line annual aver- Fig. 16. In the mid-future year (2065), the results indicate
age precipitation under the SSP1-2.6 emission scenario. that the southern portions of the study area receive the
Similarly, the annual precipitation percentage changes range highest magnitude of rainfall, exceeding 1350 mm under
from − 59.95% to 67.122%, −8.58%–190.97% and − 35.06% the SSP1-2.6 emission scenario. The Adalur station shows
and 118.16% variation under the SSP2-4.5, SSP3-7.0, and high annual rainfall (> 1250 mm) across all emission sce-
SSP5-8.5 emission scenarios. Figure 15 illustrates the spa- narios. In contrast, Dharapuram receives relatively low
tial-temporal variation of the projected annual precipitation annual rainfall, approximately 550 mm, under the SSP3-
change in 2040. The findings show that Virupatchi, Kani- 7.0 emission scenario. The eastern part of the study area
yur and Udumalaipettai are projected to receive the highest experiences high rainfall under the SSP2-4.5 emission
annual rainfall (1250 mm to > 1350 mm) under all SSP scenario. The north-eastern region, except for the SSP1-
emission scenarios in 2040, except for the SSP2-4.5 sce- 2.6 scenario, is projected to receive low to medium annual
nario. The Adalur station also exhibits high annual rainfall rainfall (550 mm to 750 mm). In the SSP1-2.6 scenario,
(> 1250 mm) in all emission scenarios. Spatial distribution the southern part of the study area receives more rainfall
diagram reveals that the Dindigul is projected to receive compared to other scenarios. Under the SSP3-7.0 sce-
high amount of rainfall (> 1350 mm) under the SSP1-2.6 nario, the south-eastern part of the study area, particularly
emission scenario. The northeastern parts of the study area Kamatchipuram, is projected to receive higher annual rain-
are projected to receive low to medium annual rainfall. fall (1250 mm to 1350 mm). Finally, under the SSP2-4.5
In the year (2040), the south-western region of the study scenario, Virupatchi is expected to receive the maximum
area experiences the highest annual rainfall under SSP3- amount of rainfall.
7.0 emission scenario. One of the key findings is that the With reference to station-wise baseline annual aver-
northern portion of the study area is projected to receive age precipitation, the projected annual precipitation
relatively low rainfall (≤ 500 mm) under both the SSP1-2.6 percentage changes for the year 2090 (Far-future year)
and SSP2-4.5 emission scenarios. range from − 48.47% to 113.29% accordance with the
SSP1-2.6 emission scenario across all the stations in the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 20 of 28 H. Shanmugam, V. R. Lakshmanan

Table 5  Model performance evaluation between observed and simulated rainfall (After bias correction)
Rainfall Stations LR DT RF
2 2
R MAE RMSE R MAE RMSE R2 MAE RMSE

Adalur 0.32 65.07 91.17 0.62 23.19 91.17 0.64 40.26 66.18
Amaravathi Nagar 0.41 43.29 67.98 0.78 12.74 67.87 0.75 25.19 44.14
Anaipalayam 0.31 39.09 58.67 0.71 12.32 58.68 0.70 22.68 38.74
Aravakuruchi 0.30 37.47 54.48 0.60 12.94 54.53 0.58 24.52 42.20
Chatrapatti 0.34 46.38 70.33 0.82 11.79 70.33 0.75 25.87 43.42
Dharapuram 0.29 45.26 64.94 0.60 15.08 64.94 0.61 27.91 48.39
Dindigul 0.37 51.95 72.63 0.59 18.72 72.63 0.60 34.35 58.22
K. Paramathi 0.39 37.09 55.37 0.67 13.38 55.37 0.69 23.80 40.11
Kamatchipuram 0.35 48.10 69.88 0.72 15.02 69.88 0.66 30.13 50.62
Kaniyur 0.33 47.98 71.26 0.65 16.50 71.26 0.66 30.06 50.54
Kankeyam 0.41 36.33 52.00 0.61 13.92 52.03 0.69 22.75 37.71
Karur 0.27 43.77 61.60 0.57 15.67 61.60 0.61 27.82 44.79
Kodaganur dam 0.37 42.81 61.09 0.75 12.77 61.09 0.71 24.89 40.93
Moolanur 0.32 36.36 54.65 0.61 14.34 54.65 0.65 22.73 39.23
Palani 0.33 46.79 79.75 0.76 14.21 79.75 0.72 28.37 51.39
Palladam 0.39 35.67 52.80 0.57 13.10 52.80 0.63 22.30 40.99
Rudhravathi 0.32 37.44 56.88 0.56 12.70 56.88 0.60 23.52 44.23
Udumalaipettai 0.35 43.39 64.93 0.60 14.18 64.93 0.61 28.19 50.41
Uthamapalayam 0.41 35.05 50.48 0.68 11.61 50.48 0.71 20.21 35.08
Vedasandur 0.39 43.61 60.55 0.76 12.64 60.55 0.72 25.57 41.22
Virupatchi 0.38 44.39 63.43 0.60 15.63 63.43 0.63 27.73 48.88
Rainfall Stations SVM MLP SR
2 2
R MAE RMSE R MAE RMSE R2 MAE RMSE

Adalur 0.309 59.73 91.89 0.317 62.83 91.38 0.64 23.19 66.28
Amaravathi Nagar 0.42 39.70 67.28 0.41 42.83 68.07 0.78 12.74 41.33
Anaipalayam 0.28 35.53 59.54 0.30 37.84 58.86 0.71 12.32 38.18
Aravakuruchi 0.28 33.77 55.17 0.30 35.92 54.51 0.60 12.94 41.54
Chatrapatti 0.32 41.53 71.44 0.34 45.45 70.52 0.82 11.70 37.46
Dharapuram 0.24 40.56 66.86 0.28 43.64 65.15 0.60 15.06 50.88
Dindigul 0.35 47.80 73.37 0.36 50.5 72.97 0.59 18.72 63.25
K. Paramathi 0.38 34.88 55.60 0.40 36.16 55.53 0.67 13.39 40.90
Kamatchipuram 0.34 44.62 70.57 0.35 46.91 70.25 0.72 15.01 45.66
Kaniyur 0.30 42.17 72.99 0.32 46.53 71.86 0.66 16.54 51.51
Kankeyam 0.40 33.50 52.67 0.41 35.70 52.08 0.67 13.92 42.13
Karur 0.23 39.90 63.23 0.26 42.37 61.82 0.60 16.04 46.38
Kodaganur dam 0.35 39.56 61.90 0.37 41.26 61.24 0.75 13.16 38.35
Moolanur 0.29 33.52 55.56 0.31 35.63 54.72 0.61 14.61 41.03
Palani 0.34 42.71 79.68 0.33 46.75 80.04 0.76 15.20 47.85
Palladam 0.37 32.27 53.82 0.39 34.42 52.90 0.63 13.45 44.51
Rudhravathi 0.31 33.94 57.21 0.32 36.15 56.91 0.61 13.03 45.40
Udumalaipettai 0.34 39.79 65.19 0.35 42.35 64.87 0.62 14.42 49.58
Uthamapalayam 0.39 32.14 51.32 0.40 34.08 50.58 0.70 11.98 37.14
Vedasandur 0.37 40.12 61.41 0.39 42.32 60.70 0.76 13.08 37.71
Virupatchi 0.38 42.06 63.50 0.37 44.47 63.79 0.61 15.94 49.88

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 21 of 28 129

from 550 mm to 650 mm) across all emission scenarios. In the


north-eastern part of the study area, low annual rainfall (rang-
ing from 550 mm to 450 mm) is predicted across all scenarios.
Stations such as Adalur, Chatrapatti, Kaniyur, Udumalaipettai,
and Virupatchi exhibit high rainfall projections, particularly
under the more optimistic emission scenarios.

5.5 Percentage change variations in future annual


precipitation

Figure 18 presents the box plots that indicate the ranges


of percentage change in future annual precipitation during
the years 2025–2100 with respect to the base-line period
(1985–2014) of annual precipitation over the entire ARB
region. The box plots graphically display the distribution
of the projected percentage changes, including the maxi-
mum, minimum, median, and the upper and lower quartiles
Fig. 12  Taylor diagram for precipitation of the MME GCM (before for each emission scenario. The distribution of projected
bias correction) and after bias correction precipitation indicates the following variations in annual
precipitation percentage change: 0.81%–67.33% under
the SSP1-2.6 emission scenario, −4.51% to 72.14% under
ARB region. Similarly, under the SSP2-4.5, SSP3-7.0, SSP2-4.5, −1.63% to 60.84% under SSP3-7.0, and − 0.72%
and SSP5-8.5 emission scenarios, the percentage changes to 65.75% under the SSP5-8.5 emission scenario across the
in annual precipitation range from − 30.85% to 105.08%, entire ARB region.
−18.26% to 83.24%, and − 10.11% to 113.98% respec-
tively, relative to the baseline annual average precipitation 5.6 Validation
in Fig. 17.
In the far-future year (2090), the Virupatchi station is pro- The observed data for the period 1985–2020 were obtained
jected to receive high annual rainfall (> 1350 mm) under the from the data center. However, for the purpose of analy-
SSP1-2.6 scenario. The southern and south-eastern parts of the sis, monthly data from 1985 to 2014 were considered as the
study area are expected to receive high annual rainfall under all base-line period. Projections for future monthly precipitation
scenarios, except for the SSP2-4.5 scenario. The eastern part of from the years 2015 to 2100 were made using four different
the study area experiences moderate annual rainfall (ranging emission scenarios, based on the respective base-line period

Fig. 13  Projected annual precipitation over ARB under SSP scenarios

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 22 of 28 H. Shanmugam, V. R. Lakshmanan

Fig. 14  Box plot for future


precipitation distibution

Fig. 15  Spatio-temporal variation of projected annual precipitation change at the year 2040 using four different emission scenarios over the
study area a SSP1-2.6, b SSP2-4.5, c SSP3-7.0 and d SSP5-8.5

(1985–2014). The validation dataset, comprising the remain- rainfall across the entire study area, although spatial and
ing six years of observed data (2015–2020), was converted temporal variations in emissions may lead to discrepancies.
into annual average rainfall values for the study area. These The projected results were validated against the observed
annual averages from the validation dataset were then com- rainfall for the years 2015–2020 under different SSP sce-
pared with the projected annual average rainfall for the years narios. Among the rainfall stations, 20% (i.e., Chatrapatti,
2015–2020, under the four different emission scenarios. Palladam, Uthamapalayam, and Virupatchi) exhibited a high
The annual rainfall projections for the SSP3-7.0 emission correlation with the SSP1-2.6 projected rainfall scenario,
scenario showed the highest correlation with the observed indicating that these stations and their respective regions

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 23 of 28 129

Fig. 16  Spatio-temporal variation of projected annual precipitation change at the year 2065 under four different emission scenarios a SSP1-2.6,
b SSP2-4.5, c SSP3-7.0 and d SSP5-8.5

require relatively low levels of mitigation and adaptation. 6 Discussion


In contrast, 9% of the stations (i.e., Kaniyur and Kankeyam)
were more closely aligned with the SSP2-4.5 scenario, The systematic biases and uncertainties in the CMIP6 pro-
suggesting that these areas require moderate attention to jections render it unusable for impact studies. To mitigate
both adaptation and mitigation efforts. Most of the sta- these issues, the proposed SR algorithm, which integrates
tions (62% of the stations), including Amaravathi Nagar, multiple machine learning algorithms, was applied to
Adalur, Anaipalayam, Dharapuram, Dindigul, K. Param- reduce these biases and uncertainties. The bias-corrected
athi, Kamatchipuram, Kodaganur Dam, Moolanur, Palani, projections generated using this approach were subse-
Rudhravathi, Udumalaipettai, and Vedasandur, showed a quently utilized for hydrological and agricultural impact
strong correlation with the SSP3-7.0 emission scenario pro- studies (Kumari et al. 2024). The SR-based precipitation
jections for the years 2015–2020. These stations and their projections showed a correlation of approximately 70%
surrounding areas need substantial adaptation and mitigation with the observed rainfall data at the station level; however,
measures. Finally, another 9% of stations (i.e., Aravakuruchi some biases and uncertainties remained in the projections.
and Karur) demonstrated a higher correlation with the SSP5- GCM Precipitation is inherently complex, which presents
8.5 scenario projections, indicating that these areas require challenges for accurate projection when compared to other
high levels of mitigation and adaptation. climate variables(Wilby et al. 2002). Therefore, the projec-
Overall, the validation process revealed a significant tion of future rainfall and impact studies should be done
degree of harmonization between observed rainfall and the cautiously. Statistical downscaling studies on precipitation
projected rainfall under different emission scenarios for the projections (Alsalal et al. 2024; Birara et al. 2020; Shahriar
years 2015–2020. Temporal variations in emissions were et al. 2021) have showed that the observed and simulated
found to influence the results, with the SSP3-7.0 scenario datasets generally show a 50–60% similarity during the
showing the most consistent correlation across the entire model performance. The machine learning-based precipita-
study area, highlighting the challenges for both adaptation tion projection in this study also performed better during the
and mitigation in the region.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 24 of 28 H. Shanmugam, V. R. Lakshmanan

Fig. 17  Spatio-temporal variation of projected annual precipitation change at the year 2090 over the study area a SSP1-2.6, b SSP2-4.5, c SSP3-
7.0 and d SSP5-8.5

developed an MME CMIP6 GCM based on combination of


5 best GCMs in the ARB region. MME GCMs provided bet-
ter than the individual GCMs in terms of performance met-
rics. Previous studies have conducted limited analyses using
a single suitable GCM-based projection (Nourani et al.
2019; Peng et al. 2023) as well as Multi-model GCM based
evaluation (Okkan and Kirdemir 2016; Seker and Gumus
2022). The stacking regression (SR) algorithm, employed in
this study, corrected the biases and weaknesses of the simu-
lated data from the reference period and projected future
precipitation in alignment with observed data using the
MME CMIP6 GCMs. A few studies have utilized machine
learning for climate projection with CMIP6 (Gumus et al.
2023; Jose et al. 2022; Sulaiman et al. 2022; Yılmaz et al.
2024) provided a correlation coefficient ranging from 50%
to 70% between the observed and simulated datasets How-
Fig. 18  Percentage change variations of projected precipitation over ever, there is a gap in studies that utilize machine learning
the ARB
with optimized HPT for precipitation projections. In con-
trast, the proposed stacking regression-based projection,
model performance without inclusion of NCEP (National integrating individual machine learning models with opti-
centers for Environmental predictors) data. mized hyperparameters provided satisfactory performance
Most of the CMIP6 is not perfectly correlated with the metrics. This methodology effectively minimized errors
observed climate value, so all the GCMs based projection in precipitation projections under the SSP1, SSP2, SSP3,
is not useful for climate impact studies. To address this, and SSP5 emission scenarios. The ARB region station-
the present study utilized 33 CMIP6 GCM models and wise precipitation projection study will be useful for better

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 25 of 28 129

understanding future rainfall variability over the study area. 7 Conclusion


This type of finer resolution climate projection study will
be highly suitable for impact assessment studies compared The current study examined the future precipitation using
to the larger influence area. stacking learning algorithm in ARB region. The study area
The projected rainfall varies are influenced by level of falls under the semi-arid climate classification. Regional
emissions, which are determined by the combined effects level projection studies are important for reducing the
of carbon emissions, methane emissions, and nitrogen agricultural impacts. The following conclusions are sum-
oxide emissions. In recent decades, there has been a nota- marized as: - MME CMIP6 GCM performed better than
ble increase in nitrous oxide emissions (https://​www.​downt​ the individual GCM, it would be recommended for global
oearth.​org.​in/​news/​clima​te-​change/​carbon-​dioxi​de-​in-​2023-​ precipitation projection studies. Taylor diagram visualized
compa​rable-​to-4-​3-​billi​on-​years-​ago-​as-​global-​green​house-​ the performance evaluation between observed and simu-
gas-​levels-​hit-​all-​time-​high-​noaa-​95466). In this regard, the lated rainfall data so it would be suggested for selecting
Amaravathi basin primarily encompassed agricultural areas the suitable GCM’s in global level climate projection stud-
(Narmada et al. 2015), where nitrogen fertilizer and manure ies. Individual machine learning with optimized HPT, DT
contributed due to the intensification of agricultural activi- and RF provided better performance than the LR, SVM,
ties, resulting in a 25% increase in emissions compared to and MLP models; it is most preferred for monthly cli-
pre-industrial levels (District 2017). The ARB region has mate projection studies over semi-arid area. The SR algo-
faced many impacts from global warming, such as mean rithm had achieved better performance evaluation than
annual rainfall shifting and extreme variation in precipitation the individual machine learning algorithm. However, this
levels. Furthermore, the study identified a few limitations. improved methodology will be recommended for global-
(1) Different GCMs were developed in CMIP6 scenarios level climate projection studies. Through the stacking
among few GCMs were found to be suitable for local studies. algorithm, RMSE range (37.14 to 66.28) was improved
(2) Suitable GCMs identified is not also perfectly correlated in the study. While stacking may not improve all perfor-
with the Indian regional level studies, it needs to update mance evaluation metrics, it may often enhance specific
the models with respect to Indian scenarios. (3) Most of metrics based on model averaging and bias-variance trade-
the historical and future CMIP6 GCM precipitation data, off, depending on the data set and types of machine learn-
as well as predictor data, have not been updated on ESGF ing algorithms used. Future projected annual precipitation
website. (4) The regional-level downscaling caused many variations were analyzed under different SSP scenarios.
flaws and uncertainties, primarily due to data deficiency. The The future rainfall percentage changes ranged SSP1-2.6
proposed methodology identified the GCMs performance (0.81–67.33%), SSP2-4.5 (−4.51–72.14%), SSP3-7.0
for each individual station. Followed by SR with optimized (−1.63–60.84%), and the highest emission scenario SSP5-
HPT which resulted in high performance metrics, reduced 8.5 (−0.714–65.75%) during the years 2015–2100 with
flaws and uncertainties and a more accurate projection of respect to baseline period over the entire ARB region.
future rainfall. The proposed methodology will be useful The projections results contain some uncertainties even
for climate projection studies across the globe level to esti- in the corrected MME CMIP6 GCM. The stacking algo-
mate the future climate parameters under different emission rithm significantly reduced the biases and uncertainties
scenarios. till the maximum extent. Further advancement required
Significant fluctuations in precipitation were observed in in CMIP6 GCMs with respect to Indian scenarios would
ARB region, which contributed the consequence of climate yield better results in future. The fine resolution climate
change. Precipitation patterns exhibit spatial variability projection studies will be more precise for impact assess-
depending on the specific location within the region. Anthro- ment studies. The results of the projected future precipi-
pogenic activities, such as increasing the concentration of tation suggest significant variation in rainfall trends in
greenhouse gases have affected the precipitation trends the ARB region, causing a severe impact on hydrological
(Trenberth 2011). Rainfall varies depending on Land-use components. The variation in hydrological components
and land cover changes, as well as topographic condition at has an impact on agricultural productivity. Based on the
regional level (Sy and Quesada 2020). In this current study, projection results, policymakers make decisions such as
projected rainfall trends showed increasing or decreasing adjusting the sowing period, proposing advanced irriga-
magnitude over the years, which will aid policy makers tion planning, and introducing new crop varieties, it will
in effectively monitoring droughts and managing water be helpful to enhance the agricultural yield and improve
resources effectively. the farmers livelihood. Through the above strategies aimed
to reduce poverty and boost the agricultural GDP in the
ARB region. The findings would address climate change

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 26 of 28 H. Shanmugam, V. R. Lakshmanan

with improved agricultural productivity and achieve the Buhay Bucton BG, Shrestha S, KC S, Mohanasundaram S, Virdis SGP,
Sustainable Development Goals (SDG 2: Zero Hunger and Chaowiwat W (2022) Impacts of climate and land use change on
groundwater recharge under shared socioeconomic pathways: a
SDG 13: Climate Action). case of Siem Reap, Cambodia. Environ Res 211:113070. https://​
doi.​org/​10.​1016/J.​ENVRES.​2022.​113070
Acknowledgements This study was supported by VIT (Vellore Insti- Cook BI, Mankin JS, Marvel K, Williams AP, Smerdon JE, Anchu-
tute of Technology, Vellore, Tamil Nadu, India) management by offer- kaitis KJ (2020) Twenty-First Century Drought projections in the
ing institutional support to carry out this research. The authors are CMIP6 forcing scenarios. Earth’s Future 8(6):1–20. https://​doi.​
thankful to the colleagues Kiruthika K.M, Loganathan K and Sentham- org/​10.​1029/​2019E​F0014​61
izhselvan T. We extend our special thanks to editorial team and review- Daniel H (2023) Performance assessment of bias correction methods
ers for providing their valuable time and effort to review the manu- using observed and regional climate model data in different water-
script. We appreciate all the comments and suggestions which hepled sheds, Ethiopia. J Water Clim Change 14(6):2007–2028. https://​
us to improve the manuscript. doi.​org/​10.​2166/​wcc.​2023.​115
Deepthi B, Sivakumar B (2023) Shortest path length for evaluating
Author contributions Vignesh Rajkumar Lakshmanan designed the general circulation models for rainfall simulation. Clim Dyn
methodological frame work. Hemanandhini Shanmugam processed 61(5–6):3009–3028. https://d​ oi.o​ rg/1​ 0.1​ 007/s​ 00382-0​ 23-0​ 6713-x
the model simulations and projections .Hemanandhini Shanmugam District S (2017) कें द्रीय भ ू मम जल बोर्ड जऱ सं साधन, नदी विकास और गं
and Vignesh Rajkumar Lakshmanan analyzed the climate variability गा सं रक्षण मं त्राऱय भारत सरकार Central Ground Water Board Min-
& interpreted the results and lead to the manuscript writing. Both the istry of Water Resources, River Development and Ganga Rejuve-
authors equally contributed to edit and revise the manuscript. nation Government of India AQUIFER MAPPING AND
GROUND WATER MANAGEMENT
Data availability No datasets were generated or analysed during the Elgeldawi E, Sayed A, Galal AR, Zaki AM (2021) Hyperparameter
current study. tuning for machine learning algorithms used for arabic sentiment
analysis. Informatics 8(4):1–21. https://​doi.​org/​10.​3390/​infor​
Code availability The code used to generate the figures and machine matic​s8040​079
learning projection in this study are available upon request from the Elkiran G, Nourani V, Elvis O, Abdullahi J (2021) Impact of climate
corresponding author. change on hydro-climatological parameters in North Cyprus:
application of artificial intelligence-based statistical downscal-
Declarations ing models. J Hydroinformatics 23(6):1395–1415. https://d​ oi.o​ rg/​
10.​2166/​hydro.​2021.​091
Competing interests The authors declare no competing interests. Eyring V, Bony S, Meehl GA, Senior CA, Stevens B, Stouffer RJ,
Taylor KE (2016) Overview of the coupled model Intercompari-
son Project Phase 6 (CMIP6) experimental design and organiza-
tion. Geosci Model Dev 9(5):1937–1958. https://d​ oi.o​ rg/1​ 0.5​ 194/​
gmd-9-​1937-​2016
References Fan X, Jiang L, Gou J (2021) Statistical downscaling and projection
of future temperatures across the Loess Plateau, China. Weather
Acharya N, Shrivastava NA, Panigrahi BK, Mohanty UC (2014) Devel- Clim Extremes 32:100328. https://​doi.​org/​10.​1016/j.​wace.​2021.​
opment of an artificial neural network based multi-model ensem- 100328
ble to estimate the northeast monsoon rainfall over south peninsu- Gebrechorkos SH, Hülsmann S, Bernhofer C (2019) Statistically down-
lar India: an application of extreme learning machine. Clim Dyn scaled climate dataset for East Africa. Sci Data 6(1):2–9. https://​
43(5–6):1303–1310. https://​doi.​org/​10.​1007/​s00382-​013-​1942-2 doi.​org/​10.​1038/​s41597-​019-​0038-1
Alsalal S, Tan ML, Samat N, Al-Bakri JT, Zhang F (2024) Temperature Gumus V, Moçayd E, Seker N, M., Seaid M (2023) Evaluation of future
and precipitation changes under CMIP6 projections in the Mujib temperature and precipitation projections in Morocco using the
Basin, Jordan. Theoret Appl Climatol. https://​doi.​org/​10.​1007/​ ANN-based multi-model ensemble from CMIP6. Atmos Res
s00704-​024-​05087-2 292:106880. https://d​ oi.o​ rg/1​ 0.1​ 016/J.A
​ TMOSR ​ ES.2​ 023.1​ 06880
Ashfaq M, Rastogi D, Kitson J, Abid MA, Kao SC (2022) Evaluation of Gurara MA, Jilo NB, Tolche AD, Kassa AK (2022) Climate change
CMIP6 GCMs over the CONUS for Downscaling studies. J Geophys projection using the statistical downscaling model in Modjo
Research: Atmos 127(21). https://​doi.​org/​10.​1029/​2022J​D0366​59 watershed, upper Awash River Basin, Ethiopia. Int J Envi-
Ballarin AS, Sone JS, Gesualdo GC, Schwamback D, Reis A, Alma- ron Sci Technol 19(9):8885–8898. https://​d oi.​o rg/​1 0.​1 007/​
gro A, Wendland EC (2023) CLIMBra - climate change data- s13762-​021-​03752-x
set for Brazil. Sci Data 10(1):1–16. https://​doi.​org/​10.​1038/​ Guven D (2023) Development of multi-model ensembles using tree-
s41597-​023-​01956-z based machine learning methods to assess the future renew-
Belete DM, Huchaiah MD (2022) Grid search in hyperparameter opti- able energy potential: case of the East Thrace, Turkey. Environ
mization of machine learning models for prediction of HIV/AIDS Sci Pollut Res 30(37):87314–87329. https://​doi.​org/​10.​1007/​
test results. Int J Comput Appl 44(9):875–886. https://​doi.​org/​10.​ s11356-​023-​28649-9
1080/​12062​12X.​2021.​19746​63 Hemanandhini S, Vignesh Rajkumar L (2023) Performance evaluation
Bhattacharjee A, Hassan SMQ, Hazra P, Kormoker T, Islam S, Alam of CMIP6 climate models for selecting a suitable GCM for future
E, Islam MK, Islam T, A. R. M (2023) Future changes of sum- precipitation at different places of Tamil Nadu. In Environmental
mer monsoon rainfall and temperature over Bangladesh using 27 Monitoring and Assessment (Vol. 195, Issue 8). Springer Inter-
CMIP6 models. Geocarto Int 38(1). https://d​ oi.o​ rg/1​ 0.1​ 080/1​ 0106​ national Publishing. https://​doi.​org/​10.​1007/​s10661-​023-​11454-9
049.​2023.​22853​42 Hernanz A, García-Valero JA, Domínguez M, Rodríguez-Camino E
Birara H, Pandey RP, Mishra SK (2020) Projections of future rainfall (2022) A critical view on the suitability of machine learning tech-
and temperature using statistical downscaling techniques in Tana niques to downscale climate change projections: illustration for
Basin, Ethiopia. Sustainable Water Resour Manage 6(5). https://​ temperature with a toy experiment. Atmospheric Sci Lett 23(6):1–
doi.​org/​10.​1007/​s40899-​020-​00436-1 9. https://​doi.​org/​10.​1002/​asl.​1087

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Development of stacking algorithm for bias‑correcting the precipitation projections using… Page 27 of 28 129

Jain S, Salunke P, Mishra SK, Sahany S (2019) Performance of CMIP5 scenarios: a case study of Iraq’s Middle and West. Atmos Res
models in the simulation of Indian summer monsoon. Theoret 306:107470. https://d​ oi.o​ rg/1​ 0.1​ 016/J.A
​ TMOSR ​ ES.2​ 024.1​ 07470
Appl Climatol 137(1–2):1429–1447. https://​doi.​org/​10.​1007/​ Munawar S, Rahman G, Moazzam MFU, Miandad M, Ullah K, Al-Ansari
s00704-​018-​2674-3 N, Linh NTT (2022) Future climate projections using SDSM and
Jaiswal R, Mall RK, Singh N, Kumar L, T. V., Niyogi D (2022) LARS-WG Downscaling methods for CMIP5 GCMs over the Trans-
Evaluation of Bias correction methods for Regional Climate boundary Jhelum River Basin of the Himalayas Region. Atmosphere
models: Downscaled Rainfall Analysis over Diverse Agrocli- 13(6). https://​doi.​org/​10.​3390/​atmos​13060​898
matic zones of India. Earth Space Sci 9(2). https://​doi.​org/​10.​ Nagaraja B (2021) Historical precipitation and variability analysis in
1029/​2021E​A0019​81 Cauvery River Basin. Poll Res 40(1):281–289
Jebeile J, Lam V, Räz T (2021) Understanding climate change with Narmada K, Bhaskaran G, Gobinath K (2015) Assessment of Ground-
statistical downscaling and machine learning. Synthese 199(1– water Quality in the Amaravathi River Basin, South India 549–573.
2):1877–1897. https://​doi.​org/​10.​1007/​s11229-​020-​02865-z https://​doi.​org/​10.​1007/​978-3-​319-​13425-3_​26
Jose DM, Vincent AM, Dwarakish GS (2022) Improving multiple Niazkar M, Goodarzi MR, Fatehifar A, Abedi MJ (2023) Machine learn-
model ensemble predictions of daily precipitation and tempera- ing-based downscaling: application of multi-gene genetic program-
ture through machine learning techniques. Sci Rep 12(1):1–25. ming for downscaling daily temperature at Dogonbadan, Iran, under
https://​doi.​org/​10.​1038/​s41598-​022-​08786-w CMIP6 scenarios. Theoret Appl Climatol 151(1–2):153–168. https://​
Kim Y, Rocheta E, Evans JP, Sharma A (2020) Impact of bias correc- doi.​org/​10.​1007/​s00704-​022-​04274-3
tion of regional climate model boundary conditions on the simu- Nourani V, Razzaghzadeh Z, Baghanam AH, Molajou A (2019) ANN-
lation of precipitation extremes. Clim Dyn 55(11–12):3507– based statistical downscaling of climatic parameters using decision
3526. https://​doi.​org/​10.​1007/​S00382-​020-​05462-5/​METRI​CS tree predictor screening method. Theoret Appl Climatol 137(3–
Kokilavani S, Selvi R, Panneerselvam S, Dheebakaran G (2017) 4):1729–1746. https://​doi.​org/​10.​1007/​s00704-​018-​2686-z
Trend Analysis of Rainfall Variability in Western Agro Cli- Okkan U, Kirdemir U (2016) Downscaling of monthly precipitation
matic Zone of Tamil Nadu. Curr World Environ 12(1):181–187. using CMIP5 climate models operated under RCPs. Meteorol Appl
https://​doi.​org/​10.​12944/​CWE.​12.1.​22 23(3):514–528. https://​doi.​org/​10.​1002/​met.​1575
Kolluru V, Kolluru S (2021) Development and evaluation of pre and Peng S, Wang C, Li Z, Mihara K, Kuramochi K, Toma Y, Hatano R
post integration techniques for enhancing drought predictions (2023) Climate change multi-model projections in CMIP6 scenarios
over India. Int J Climatol 41(10):4804–4824. https://​doi.​org/​ in Central Hokkaido, Japan. Sci Rep 13(1):0–28. https://​doi.​org/​10.​
10.​1002/​joc.​7100 1038/​s41598-​022-​27357-7
Kolluru V, Kolluru S, Wagle N, Dev T (2020) Secondary precipita- Prathom C, Champrasert P (2023) General circulation model downscaling
tion estimate merging using machine learning: development and using interpolation—machine learning model combination—case
evaluation over krishna river basin, India. Remote Sens 12(18). study: Thailand. Sustain (Switzerland) 15(12). https://​doi.​org/​10.​
https://​doi.​org/​10.​3390/​RS121​83013 3390/​su151​29668
Kossieris P, Tsoukalas I, Brocca L, Mosaffa H, Makropoulos C, Rahimi R, Tavakol-Davani H, Nasseri M (2021) An uncertainty-based
Anghelea A (2024) Precipitation data merging via machine Regional comparative analysis on the performance of different
learning: revisiting conceptual and technical aspects. J Hydrol Bias correction methods in statistical downscaling of precipitation.
637(May):131424. https://​doi.​org/​10.​1016/j.​jhydr​ol.​2024.​131424 Water Resour Manage 35(8):2503–2518. https://​doi.​org/​10.​1007/​
Kumari P, Jaiswal RK, Singh HP (2024) Assessing climate change s11269-​021-​02844-0
impacts on irrigation water requirements in the Lower Mahanadi Raju KS, Kumar DN (2020) Review of approaches for selection and
Basin: a CMIP6-based spatiotemporal analysis and future projec- ensembling of GCMS. J Water Clim Change 11(3):577–599. https://​
tions. J Water Clim Change 00(0):1–18. https://​doi.​org/​10.​2166/​ doi.​org/​10.​2166/​wcc.​2020.​128
wcc.​2024.​152 Reddy NM, Saravanan S, Paneerselvam B (2024) Integrating conceptual
Lu M, Hou Q, Qin S, Zhou L, Hua D, Wang X, Cheng L (2023) A and machine learning models to enhance daily-scale streamflow
stacking ensemble model of various machine learning models for simulation and assessing climate change impact in the watersheds
daily runoff forecasting. Water (Switzerland) 15(7). https://​doi.​ of the Godavari basin, India. Environ Res 250:118403. https://​doi.​
org/​10.​3390/​w1507​1265 org/​10.​1016/J.​ENVRES.​2024.​118403
Me O, Balmaceda-Huarte R, Bettolli M (2022) Multi-model ensemble Rettie FM, Gayler S, Weber TKD, Tesfaye K, Streck T (2023) High-
of statistically downscaled GCMs over southeastern South Amer- resolution CMIP6 climate projections for Ethiopia using the gridded
ica: historical evaluation and future projections of daily precipi- statistical downscaling method. Sci Data 10(1):1–18. https://​doi.​org/​
tation with focus on extremes. Clim Dyn 59(9–10):3051–3068. 10.​1038/​s41597-​023-​02337-2
https://​doi.​org/​10.​1007/​s00382-​022-​06236-x Rhymee H, Shams S, Ratnayake U, Rahman EKA (2022) Comparing
Meharie MG, Mengesha WJ, Gariy ZA, Mutuku RNN (2022) Appli- statistical downscaling and arithmetic Mean in simulating CMIP6
cation of stacking ensemble machine learning algorithm in pre- Multi-model Ensemble over Brunei. Hydrology 9(9). https://​doi.​org/​
dicting the cost of highway construction projects. Eng Constr 10.​3390/​hydro​logy9​090161
Architectural Manage 29(7):2836–2853. https://​doi.​org/​10.​1108/​ Schilling J, Hertig E, Tramblay Y, Scheffran J (2020) Climate change
ECAM-​02-​2020-​0128 vulnerability, water resources and social implications in North
Mesgari E, Hosseini SA, Partoo LG, Hemmesy MS, Houshyar M Africa. Reg Envriron Chang 20(1). https://​doi.​org/​10.​1007/​
(2022) Assessment of CMIP6 models’ performances and projec- s10113-​020-​01597-7
tion of precipitation based on SSP scenarios over the MENAP Schoof JT, Robeson SM (2016) Projecting changes in regional tempera-
region. J Water Clim Change 13(10):3607–3619. https://​doi.​org/​ ture and precipitation extremes in the United States. Weather Clim
10.​2166/​wcc.​2022.​195 Extremes 11:28–40. https://​doi.​org/​10.​1016/j.​wace.​2015.​09.​004
Mishra V, Bhatia U, Tiwari AD (2020) Bias-corrected climate pro- Seker M, Gumus V (2022) Projection of temperature and precipitation
jections for South Asia from coupled Model Intercompari- in the Mediterranean region through multi-model ensemble from
son Project-6. Sci Data 7(1):1–13. https://​d oi.​o rg/​1 0.​1 038/​ CMIP6. In Atmospheric Research (Vol. 280, Issue December).
s41597-​020-​00681-1 https://​doi.​org/​10.​1016/j.​atmos​res.​2022.​106440
Mukheef RAH, Hassan WH, Alquzweeni S (2024) Projections of Shahhosseini M, Hu G, Pham H (2022) Optimizing ensemble weights
temperature and precipitation trends using CMhyd under CMIP6 and hyperparameters of machine learning models for regression

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


129 Page 28 of 28 H. Shanmugam, V. R. Lakshmanan

problems. Mach Learn Appl 7(December 2021):100251. https://​ Wang D, Liu J, Luan Q, Shao W, Fu X, Wang H, Gu Y (2023) Projection
doi.​org/​10.​1016/j.​mlwa.​2022.​100251 of future precipitation change using CMIP6 multimodel ensemble
Shahriar SA, Siddique MAM, Rahman SMA (2021) Climate change pro- based on fusion of multiple machine learning algorithms: a case in
jection using statistical downscaling model over Chittagong Divi- Hanjiang River Basin, China. Meteorol Appl 30(5):1–21. https://​
sion, Bangladesh. Meteorol Atmos Phys 133(4):1409–1427. https://​ doi.​org/​10.​1002/​met.​2144
doi.​org/​10.​1007/​s00703-​021-​00817-x Wehbe Y, Temimi M, Adler RF (2020) Enhancing precipitation estimates
Shrestha SG, Pradhanang SM (2022) Optimal selection of representative through the fusion of weather radar, satellite retrievals, and surface
climate models and statistical downscaling for climate change impact parameters. Remote Sens 12(8). https://​doi.​org/​10.​3390/​RS120​
studies: a case study of Rhode Island, USA. Theoret Appl Climatol 81342
149(1–2):695–708. https://​doi.​org/​10.​1007/​s00704-​022-​04073-w Wilby RL, Dawson CW, Barrow EM (2002) Sdsm — a decision support
Sulaiman NAF, Shaharudin SM, Ismail S, Zainuddin NH, Tan ML, Jalil YA tool for the assessment of regional climate change impacts. Envi-
(2022) Predictive modelling of statistical Downscaling based on Hybrid ron Model Softw 17(2):145–157. https://​doi.​org/​10.​1016/​S1364-​
Machine Learning Model for Daily Rainfall in East-Coast Peninsular 8152(01)​00060-3
Malaysia. Symmetry 14(5). https://​doi.​org/​10.​3390/​sym14​050927 Xiang B, Zeng C, Dong X, Wang J (2020) The application of a decision
Sutanudjaja EH, Santos MJ, Minderhoud PSJ, Garmestani AS (2018) Pr tree and stochastic forest model in summer precipitation prediction
Ep Rin ot pe er re v iew pr ep rin t n ot pe er v ed in Chongqing. Atmosphere 11(5). https://​doi.​org/​10.​3390/​ATMOS​
Sy S, Quesada B (2020) Anthropogenic land cover change impact on 11050​508
climate extremes during the 21st century. Environ Res Lett 15(3). Xu Z, Han Y, Tam CY, Yang ZL, Fu C (2021) Bias-corrected CMIP6
https://​doi.​org/​10.​1088/​1748-​9326/​ab702c global dataset for dynamical downscaling of the historical and future
Taylor KE (2001) Single Diagr 106:7183–7192 climate (1979–2100). Sci Data 8(1):1–11. https://​doi.​org/​10.​1038/​
Teutschbein C, Seibert J (2012) Bias correction of regional climate model s41597-​021-​01079-3
simulations for hydrological climate-change impact studies: review Yılmaz B, Aras E, Nacar S (2024) A CMIP6-ensemble-based evaluation
and evaluation of different methods. J Hydrol 456–457:12–29. of precipitation and temperature projections. Theoret Appl Climatol.
https://​doi.​org/​10.​1016/J.​JHYDR​OL.​2012.​05.​052 https://​doi.​org/​10.​1007/​s00704-​024-​05066-7
Thirunavukkarasu P, Ambujam NK (2020) Zoning of groundwater poten-
tial in Amaravathy river basin, south India, by integrating remote Publisher’s note Springer Nature remains neutral with regard to
sensing and GIS. Indian J Geo-Mar Sci 49(10):1686–1692 jurisdictional claims in published maps and institutional affiliations.
Trenberth KE (2011) Changes in precipitation with climate change. Cli-
mate Res 47(1–2):123–138. https://​doi.​org/​10.​3354/​cr009​53 Springer Nature or its licensor (e.g. a society or other partner) holds
Velpuri M, Das J, Umamahesh NV (2023) Spatio-temporal compounding of exclusive rights to this article under a publishing agreement with the
connected extreme events: projection and hotspot identification. Envi- author(s) or other rightsholder(s); author self-archiving of the accepted
ron Res 235:116615. https://​doi.​org/​10.​1016/J.​ENVRES.​2023.​116615 manuscript version of this article is solely governed by the terms of
Wang B, Zheng L, Liu DL, Ji F, Clark A, Yu Q (2018) Using multi-model such publishing agreement and applicable law.
ensembles of CMIP5 global climate models to reproduce observed
monthly rainfall and temperature with machine learning methods in Aus-
tralia. Int J Climatol 38(13):4891–4902. https://​doi.​org/​10.​1002/​joc.​5705

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

You might also like