Groundwater Level Prediction With Machine Learning To Support Sustainable Irrigation in Water Scarcity Regions
Groundwater Level Prediction With Machine Learning To Support Sustainable Irrigation in Water Scarcity Regions
Scarcity Regions
*
Wanru Li , Mekuanent Muluneh Finsa , Kathryn Blackmond Laskey , Paul Houser , Rupert Douglas-Bate
doi: 10.20944/preprints202309.1165.v1
Keywords: machine learning; groundwater table; ground water level; sustainable irrigation; drinking water;
Copyright: This is an open access article distributed under the Creative Commons
Attribution License which permits unrestricted use, distribution, and reproduction in any
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and
contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting
from any ideas, methods, instructions, or products referred to in the content.
Article
Groundwater Level Prediction with Machine
Learning to Sup-port Sustainable Irrigation in Water
Scarcity Regions
Wanru Li 1, *, Mekuanent Muluneh Finsa 2, 5, Kathryn B. Laskey 1, Paul Houser 3
and Rupert Douglas-Bate 4
1 Department of System Engineering and Operational Research, George Mason University, Fairfax, VA
22030, USA; [email protected]
2 Institute of Hydrogeology, Engineering Geology and Applied Geophysics, Charles University, Czechia;
[email protected]
3 Department of Geography and Geoinformation Science, George Mason University, Fairfax, VA 22030,
USA; [email protected]
4 Global MapAid, United Kingdom; [email protected]
5 Water Resource Research Center, Arba Minch University, Arba Minch, Ethiopia;
[email protected]
* Correspondence: [email protected].
Abstract: In water scarcity regions, using data-driven approaches to predict groundwater level is challenging
due to limited data availability. However, these regions have substantial water needs and require cost-effective
groundwater utilization strategies. In this study, we use artificial intelligence to predict groundwater levels to
provide guidance for drilling shallow boreholes for subsistence irrigation. The Bilate watershed, which is
located in southern Ethiopia, was selected as the study area. This is typical of areas in Africa with high demand
for water and limited availability of well data. Using a non-time-series database of 75 boreholes, machine
learning models including multiple linear regression, multivariate adaptive regression spline, artificial neural
networks, random forest regression, and gradient boosting regression (GBR) were constructed to predict the
depth to the water table. 20 independent variables were considered in the models. GBR performed the best of
the approaches with an average 0.77 R-squared value on testing data. Finally, a map of predicted water levels
in the Bilate watershed was created based on the best model with water levels ranging from 1.6 to 245.9 meters.
With the limited set of borehole data, the results show a clear signal that can provide guidance for borehole
drilling decisions for sustainable irrigation with additional implications for drinking water.
Keywords: machine learning; groundwater table; ground water level; sustainable irrigation;
drinking water; water-scarcity regions; AI; gradient boosting regression
1. Introduction
Ethiopia is one of the countries in East Africa that is threatened by water scarcity. In Ethiopia,
rainfed agriculture and small farming families predominate. A recent study shows that around 95%
of the agricultural areas in Ethiopia are rainfed areas [1]. A Food and Agriculture Organization (FAO)
study [2] shows that small farming families make up 72% of the total population. 74% of Ethiopia’s
farmers come from small farming families and 67% of these live below the national poverty line.
About 75% of farmland is devoted to cereals. Maize and wheat dominate, complemented by teff,
barley, sorghum, and rice. Drought is a major stressor that reduces cereal yields. Moreover, climate
change has induced significant and erratic deviations in rainfall patterns over the year and across the
country, significantly reducing crop yields overall, and especially cereals [3]. To mitigate the impact
of water stress and reduce hunger, utilizing groundwater by drilling wells for irrigation is a potential
solution to address the problem of increasingly erratic rainfall.
Predictions of groundwater level, or the depth-to-water table, could support decisions on where
to drill wells to extract groundwater. Artificial intelligence (AI) has been widely used to predict both
the surface [4-10] and groundwater [11-21] levels globally. Regarding surface water level prediction,
Khan and Coulibaly [4] used a Support Vector Machine (SVM) to examine long-term water level in
Lake Erie in North America based on mean monthly water level data from 1918 to 2001. The authors
compared SVM with a multilayer perceptron (MLP) and with a conventional multiplicative seasonal
autoregressive model. They found that the SVM outperformed the other two models with an overall
RMSE less than 0.25 m. Liang et al. [5] applied SVM and a deep learning model based on a Long
Short-Term Memory (LSTM) network to predict daily surface water levels in Dongting Lake in China.
They found that the LSTM has better accuracy than the SVM model with less than 0.1 m RMSE. A
river water level study performed by Chen and Qiao in 2021 [6] also confirms that LSTM has good
performance in predicting surface water levels. Choi et al. [7] used four machine learning algorithms
including artificial neural networks (ANN), decision tree, random forest (RF), and SVM based on the
daily water level from 2009 to 2013 to predict water levels from 2013 to 2015 in Upo wetlands, South
Korea. They found that random forest outperforms the other three algorithms with a 0.09 RMSE.
Regarding groundwater level prediction, in 2013, Sahoo and Jha [11] constructed seventeen site-
specific AI models to predict groundwater levels in Japan. Compared to multiple linear regression
(MLR) models, ANN-predicted groundwater levels have a better agreement with RMSE values
ranging from 0.04 to 0.4 m for 17 sites. Sahoo et al. [12] developed a modeling framework using
Multilayer Perceptron (MLP) network architecture to simulate groundwater level changes in two
agricultural regions in the US. They found ANN performed better than the MLR and multivariate
nonlinear regression model with RMSE less than 2 m for both the agricultural regions. Zhang et al.
[13] developed a new model based on LSTM to predict groundwater levels, which outperforms feed-
forward neural networks and double LSTM with a 0.14 m RMSE. In 2021, Liu et al. [14] applied SVM
combined with the data assimilation (DA) technique for predicting changes in groundwater level.
The researchers predicted the change in groundwater levels at 1 to 3-month time scales for 46 wells
located in the northeast United States and found that both the SVM and SVM with DA can adequately
predict groundwater levels with RMSE less than 4 meters. Hikouei et al. [15] demonstrated that
Extreme Gradient Boosting exhibits superior performance over the RF model for predicting
groundwater levels in the tropical peatlands of Indonesia. Many recent studies employed wavelet-
transformed analysis [16-17] and hybrid AI techniques [18-19], which have a good performance in
groundwater level prediction using time-series data.
Most of these studies that use machine learning algorithms employed a large amount of time
series data to predict future water levels for lakes [4-5, 8], rivers [6, 9-10], wetlands [7], basins [11, 17,
19], regions [13, 16, 18, 20], aquifers [12], and watersheds [14, 21]. These models generally have good
performance with low mean square error in predictions of water levels. These studies were conducted
in regions with high data availability on both the time series of water level data and climate variables.
However, in many regions of interest, a large amount of time series of water level data is unlikely to
be available due to a high cost of data collection for local government or organizations. Many water
scarcity areas have great need for groundwater development for sustainable irrigation although the
lack of good data can make models perform poorly. To the best of our knowledge, there is no research
on groundwater level prediction using machine learning based on non-time-series data in rainfed
agricultural regions. It is much more difficult to predict water levels accurately in the absence of time
series data, because previous water level is a strong predictor of current water level.
The objective of this study is to use AI to identify suitable drilling locations for sustainable
irrigation for subsistence agriculture in water scarcity regions using sparse non-time-series data on
existing wells. To achieve this objective, five machine learning models were constructed to predict
groundwater levels including MLR, multivariate adaptive regression spline (MARS), ANN, random
forest regression (RFR), and gradient boosting regression (GBR). The models were developed using
data from 75 existing boreholes in the Bilate watershed in southern Ethiopia. The best-performing
model was used to predict the groundwater level for hundreds of thousands of grid points covering
the Bilate region. Finally, a map of predicted water levels was created to provide guidance for
decision making on drilling locations for local individuals and organizations. Figure 1 shows the
workflow of this study.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 September 2023 doi:10.20944/preprints202309.1165.v1
The rest of the paper is organized as follows. Section 2 describes the study area, data source and
the methodology used in this study; Section 3 shows the main results from each of the machine
learning models; Section 4 provides a discussion of the results; Section 5 concludes the paper.
In this study, the dependent variable is the static water level. Field data on 75 boreholes were
collected by Arbaminch Water Technology Institute (AWTI) in 2007 [27]. We fully acknowledge the
limitations associated with our dataset, which was collected fifteen years ago. However, a change in
static water level within the last fifteen years, even if it occurred, would have very little impact on
our analysis. Any change would likely be within a few centimeters, which is small in relation to the
accuracy of our analysis. Therefore, the existing historical dataset is useful to demonstrate the value
of machine learning for predicting water levels.
Figure 2. Bilate watershed (pink region on the lower left plot) belongs to a basin called Rift Valley
(light green basin on the upper left plot). The map of Bilate watershed with boreholes and elevation
[28] is shown on the right.
out for each training dataset. Performing a variety of experiments is important to verify the model’s
ability to effectively generalize to unseen data. This paper mainly presents and discusses models with
median performance, and are based on consistent training and testing data separations across all
algorithms. The average performance scores from the fifteen experiments are also reported,
providing a comprehensive understanding of each model's effectiveness.
To predict groundwater levels for a larger area in the Bilate region, we generated a grid with
resolution 100m * 100m that covered the Bilate region. The data for twenty independent variables for
each grid point were prepared and processed in the same manner as for the original dataset of 75
boreholes. Because the resolution of the grid points is relatively high, grid points that fall within the
same spatial resolution unit of a variable will have identical extracted values. The software tools used
for data preparation and visualization were mainly QGIS 3.24.1 [29] and R 4.1.3 [30].
2.3.2. Bootstrapping
Bootstrapping is a resampling method that randomly samples values with replacement. It is
mainly employed in the construction of RFR models. In the context of RFR, each decision tree within
the forest is trained on a distinct dataset, generated by bootstrapping the original training set. This
ensures each dataset is of the same size as the original, but composed of a subset of the original data,
with some samples likely repeated. This process introduces randomness into the model-building
phase, which aids in preventing overfitting and improves model robustness by ensuring that each
individual tree within the forest learns from a slightly different sample of the data.
when a termination criterion is met. To make a prediction at a point, the tree is traversed to find the
leaf node corresponding to the independent variables corresponding to the point, and the dependent
variables for the data points at the leaf nodes are averaged [49]. This procedure, creating decision
trees based on different bootstrap samples and then averaging the predictions from all the trees, is
called bootstrap aggregation, or bagging.
The bagging technique in the RFR model tends to reduce the variance of predictions, but a bias
still exists. Specifically, since the prediction is the average of the output from all leaf nodes,
observations with small values tend to be overestimated and those with large values tend to be
underestimated. This tendency to bias has been identified in previous studies [50, 51]. Our results
showed a bias in the initial RFR model. To correct the bias, a post-processing bias-correcting
transformation to the RFR predictions could be made [51]. For the linear transformation, a linear
regression model was fitted to find the intercept (β0) and slope (β1) of the transformed prediction:
f(ŷ) = β0 + β1* ŷ (4)
where ŷ is the predicted response value by the initial random forest model on training data; β0 is the
coefficient for the intercept, and β1 is the coefficient for ŷ. The objective is to find the parameters that
minimize the mean square error:
min 1/n ∗ ∑(f(ŷi ) − yi )2 (5)
β0 ,β1
where yi is the ith observed value; f(xi ) is the predicted response value; n is the number of
observations. Next, The predictions from the decision tree are combined with the current ensemble's
predictions to obtain an updated prediction. This updated prediction is added to the ensemble. Then,
the residuals are recalculated using the updated predictions. The new residuals represent the errors
that were not captured by the current ensemble. The process continues for a specified number of
iterations or until a certain stopping criterion is met. The final prediction is obtained by summing the
predictions from the entire ensemble. By iteratively correcting the errors of the previous models,
gradient boosting regression is able to learn complex relationships and improve predictive accuracy.
where p(x, y) represents the joint probability function of x and y; p(x) and p(y) are the marginal
probability functions of x and y, respectively.
RMSE describes how far the predictions deviate from the actual values. A small RMSE
represents a good performance of the model.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 September 2023 doi:10.20944/preprints202309.1165.v1
∑ni=1(yi − ŷi )2
RMSE = √ (8)
n
MAE measures the median of absolute errors between the predicted and observed values.
Similar to RMSE, it is also used to describe how well the data fit the model.
MAE = Median(|ŷi − yi |) (9)
R-squared represents how much variation for a dependent variable is explained by an
independent variable.
∑ni=1(yi − ŷi )2
R2 = 1 − , (10)
∑ni=1(yi − y̅)2
where y denotes the observed values; ŷ denotes the predicted values; y̅ denotes the mean of the
observed values.
3. Results
In this section, we detail the mutual information analysis and outcomes from all the
implemented methods. We primarily focus on results from one training and testing data partition
used consistently across all methods. This training and testing split was chosen because it yielded
close to median performance across all models. For the ANN, RFR, and GBR methods, three
experiments were conducted for each data partition, hence models with median performance were
selected. Primary results associated with all algorithms include the residuals versus predicted values
outlined in Table 7, and the observed versus predicted values for both training and testing data
detailed in Table 8. Furthermore, Tables 5 and 6 summarize the performance metrics of the models
based on the single experiment that has a median performance and average performance score based
on the fifteen experiments, respectively.
Figure 4. Mutual information between independent variables and the dependent variable.
10
Before building a MLR, highly correlated predictors were removed. If a pair of predictors have
a correlation equal to or greater than 0.85, the R function findCorrelation() randomly picks one
predictor to remove. Table 3 shows the predictors remaining after removal along with their
coefficients. We found the factors including the euric vertisols soil type (X2) and NDVI from Jun to
Sep (X20) have a significant relationship with the static water level at 0.05 significance level.
To examine the normality assumption of a linear regression model, we created a Quantile-
Quantile (Q-Q) plot and a residual plot. From the Q-Q plot shown in Figure 5, we see that the points
are approximately distributed along the line with light upper and lower tails. No obvious pattern
was found on the residual plot (Figure 7a). We also performed a Shapiro-Wilk normality test on
residuals with a null hypothesis – the residual data are normally distributed. The p-value is
approximately 0.51; therefore, the null hypothesis of normality is not rejected at the 0.05 significance
level. The model performance results are shown in Table 4 and 5 and will be discussed in section 4.2.
11
for the model. The Generalized cross-validation (GCV) criterion was used to evaluate each subset
and was also considered as the variable importance measure. From Figure 6, the first three most
important variables include LST daytime from Feb to May (X13), precipitation from Feb to May (X4),
and wind speed from Oct to Jan (X9). Soil type (X2) eutric vertisols has an importance value of zero
indicating that this predictor did not contribute to the predictive power of the model and was never
used in any of the MARS basis functions in the pruned final model.
Even though MARS, being a nonparametric technique, does not assume linearity and
homoscedasticity, the residual plot can still provide valuable diagnostic information about how well
the model fits the data. Figure 7 shows that the residuals are randomly scattered around zero,
indicating that the variance of the error is approximately constant across all levels of the independent
variables, which is a desirable property.
Figure 6. Variable importance plots: (a) MARS; (b) RFR; (c) GBR.
Figure 7. Plot of residuals (m) versus predicted values (m) for (a) MLR; (b) MARS; (c) ANN; (d)
Original RFR; (e) RFR with linear transformation; (f) GBR.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 September 2023 doi:10.20944/preprints202309.1165.v1
12
13
Figure 8. Observed (m) versus predicted (m) plots for (a) MLR; (b) MARS; (c) ANN; (d) Original RFR;
(e) RFR with linear transformation; (f) GBR.
Table 4. Model performance evaluation based on one experiment that has a median performance
Table 5. Average model performance score based on multiple experiments did for each model
14
4. Discussion
15
Table 5 presents the average performance score based on multiple experiments conducted for
each model. Consistent with the findings from the single experiment (Table 4), GBR has the best
performance among the models. The RFR model's performance is slightly lower than that of GBR.
MARS, on the other hand, exhibits the weakest average performance among the five models.
Collectively, these results suggest that GBR and RFR are most suitable for predicting depth to water
table for our data set.
Figure 9. (a) Residual plot for the predicted water level for the nearest grid point; (b)Actual static
water level versus predicted water level for the nearest grid point.
16
5. Conclusions
To conclude, this research emphasizes the application of AI in pinpointing viable drilling sites
for sustainable irrigation and even drinking water, in water-deficient areas. Prior studies have
typically utilized time-series data for groundwater level prediction. However, the challenge in water
scarcity regions lies in the lack of data due to formidable data collection constraints. These regions,
marked by higher water demand, necessitate effective strategies for groundwater exploitation.
Addressing this gap, we have utilized the available non-time-series data to devise five machine
learning models for groundwater level prediction. Of these, Gradient Boosting Regression
consistently demonstrated superior performance, with an average R-squared value of 0.77 across
numerous experiments. The highest-performing model was subsequently employed to predict
groundwater levels across the entire Bilate region. This process resulted in the development of a high-
resolution map, anticipated to guide local communities and organizations in pinpointing the most
suitable locations for sustainable irrigation drilling.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 September 2023 doi:10.20944/preprints202309.1165.v1
17
Investigating variable importance revealed that Land Surface Temperature during daytime from
February to May, NDVI from June to September, and precipitation from February to May consistently
demonstrated significance across models. We captured inconsistencies between the variable
importance from the MI method and machine learning methods. The results from both should be
considered complementary rather than contradictory. Using a combination of methods allows for a
more robust and comprehensive understanding of variable importance, leading to a more reliable
model. In case of substantial discrepancies, deeper investigations can be conducted to reconcile the
findings.
There are some limitations of this study. Firstly, a potential concern is the relatively small dataset
of 75 boreholes, which are not evenly distributed throughout the Bilate watershed. This may present
a limitation for making region-wide predictions. Secondly, the predictor variables we considered
were limited to ones that could be readily computed from publicly available data. Thirdly, the ANN
model showed a tendency to overfit the training data, indicating the need for more extensive
hyperparameter tuning and model simplification. Lastly, our data, having been collected in 2007,
may be somewhat dated. Efforts are underway to acquire more recent data to verify prediction
accuracy. Future research in this region should aim to improve the predictive power of groundwater
levels by considering additional predictor variables (e.g., distance to water and elevation above
permanent streams), forecast groundwater recharge, and analyze the impacts of climate change. This
will provide comprehensive guidance for decision-making related to borehole drilling.
6. Patents
The findings in this paper are being incorporated into a system called WellMapr© and designed
to support drilling decisions for shallow groundwater and drinking water wells in Ethiopia.
Author Contributions: All authors contributed to the conceptualization, design of the study, and reading and
revising the manuscript. Methodology, W.L. and K.L.; software, W.L.; validation, W.L. and K.L.; formal analysis,
W.L.; investigation, W.L. and K.L.; resources, M.F. and R.D.; data collection, M.F.; data curation, W.L.; writing—
original draft preparation, W.L.; writing—review and editing, K.L., R.D., M.F., P.H., and W.L.; visualization,
W.L. and M.F.; supervision, M.F., P.H., R.D., and K.L.; project administration, K.L. and R.D. All authors have
read and agreed to the published version of the manuscript.
Funding: This research was partially supported by a graduate research fellowship to W.L. from George Mason
University’s Center for Resilient and Sustainable Communities.
Acknowledgments: This work represents a collaboration among George Mason University, Arba Minch
University and Global Map Aid with support by the Czech Geological Survey. We would like to express our
gratitude to Dr. Jiří Bruthans for the valuable comments and feedback on the manuscript.
References
1. Chandrasekharan, K. M.; Subasinghe, C.; Haileslassie, A. Mapping irrigated and rainfed agriculture in Ethiopia
(2015-2016) using remote sensing methods (Vol. 196). 2021, International Water Management Institute (IWMI).
2. FAO. Small Family Farms Country Factsheet Thiopia - food and agriculture. Available online:
https://www.fao.org/3/i8911en/I8911EN.pdf (assessed on August 9, 2022)
3. Haileslassie, A. On-Farm Smallholder Irrigation Performance in Ethiopia: From Water Use Efficiency to
Equity and Sustainability. 2016, ISBN 978-92-9146-468-5.
4. Khan, M.S.; Coulibaly, P. Application of Support Vector Machine in Lake Water Level Prediction. J. Hydrol.
Eng. 2006, 11, 199–205, doi:10.1061/(ASCE)1084-0699(2006)11:3(199).
5. Liang, C.; Li, H.; Lei, M.; Du, Q. Dongting Lake Water Level Forecast and Its Relationship with the Three
Gorges Dam Based on a Long Short-Term Memory Network. Water 2018, 10, 1389, doi:10.3390/w10101389.
6. Chen, S.; Qiao, Y. Short-Term Forecast of Yangtze River Water Level Based on Long Short-Term Memory
Neural Network. IOP Conf. Ser.: Earth Environ. Sci. 2021, 831, 012051, doi:10.1088/1755-1315/831/1/012051.
7. Choi, C.; Kim, J.; Han, H.; Han, D.; Kim, H. S. Development of water level prediction models using machine
learning in wetlands: A case study of Upo wetland in South Korea. Water 2019, 12(1), 93.
8. Wang, Q.; Wang, S. Machine learning-based water level prediction in Lake Erie. Water 2020, 12(10), 2654.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 September 2023 doi:10.20944/preprints202309.1165.v1
18
9. Assem, H.; Ghariba, S.; Makrai, G.; Johnston, P.; Gill, L.; Pilla, F. Urban Water Flow and Water Level
Prediction Based on Deep Learning. In Machine Learning and Knowledge Discovery in Databases: European
Conference, ECML PKDD 2017, Skopje, Macedonia, September 18–22, 2017, Proceedings, Part III 10 (pp. 317-329).
Springer International Publishing.
10. Kim, D.; Han, H.; Wang, W.; Kim, H. S. Improvement of Deep Learning Models for River Water Level
Prediction Using Complex Network Method. Water 2022, 14(3), 466.
11. Sahoo, S.; Jha, M.K. Groundwater-Level Prediction Using Multiple Linear Regression and Artificial Neural
Network Techniques: A Comparative Assessment. Hydrogeol J 2013, 21, 1865–1887, doi:10.1007/s10040-013-
1029-5.
12. Sahoo, S.; Russo, T.A.; Elliott, J.; Foster, I. Machine Learning Algorithms for Modeling Groundwater Level
Changes in Agricultural Regions of the U.S. Water Resources Research 2017, 53, 3878–3895,
doi:10.1002/2016WR019933.
13. Zhang, J.; Zhu, Y.; Zhang, X.; Ye, M.; Yang, J. Developing a Long Short-Term Memory (LSTM) based model
for predicting water table depth in agricultural areas. Journal of hydrology 2018, 561, 918-929.
14. Liu, D.; Mishra, A. K.; Yu, Z.; Lü, H.; Li, Y. Support vector machine and data assimilation framework for
Groundwater Level Forecasting using GRACE satellite data. Journal of Hydrology 2021, 603, 126929.
15. Hikouei, I. S.; Eshleman, K. N.; Saharjo, B. H.; Graham; L. L.; Applegate, G.; Cochrane, M. A. Using machine
learning algorithms to predict groundwater levels in Indonesian tropical peatlands. Science of the Total
Environment, 2023, 857, 159701.
16. Rahman, A. S.; Hosono, T.; Quilty, J. M.; Das, J.; Basak, A. Multiscale groundwater level forecasting:
Coupling new machine learning approaches with wavelet transforms. Advances in Water Resources 2020,
141, 103595.
17. Wen, X.; Feng, Q.; Deo, R. C.; Wu, M.; Si, J. Wavelet analysis–artificial neural network conjunction models
for multi-scale monthly groundwater level predicting in an arid inland river basin, northwestern China.
Hydrology Research 2017, 48(6), 1710-1729.
18. Bahmani, R.; Ouarda, T. B. Groundwater level modeling with hybrid artificial intelligence techniques.
Journal of Hydrology 2021, 595, 125659.
19. Liu, W.; Yu, H.; Yang, L.; Yin, Z.; Zhu, M.; Wen, X. Deep Learning-Based Predictive Framework for
Groundwater Level Forecast in Arid Irrigated Areas. Water 2021, 13(18), 2558.
20. Wu, Z.; Lu, C.; Sun, Q.; Lu, W.; He, X.; Qin, T.; Yan, L.; Wu, C. Predicting Groundwater Level Based on
Machine Learning: A Case Study of the Hebei Plain. Water 2023, 15(4), 823.
21. Kochhar, A.; Singh, H.; Sahoo, S.; Litoria, P. K.; Pateriya, B. Prediction and forecast of pre-monsoon and
post-monsoon groundwater level: using deep learning and statistical modelling. Modeling Earth Systems
and Environment 2022, 8(2), 2317-2329.
22. Orke, Y.A.; Li, M. H. Hydroclimatic Variability in the Bilate Watershed, Ethiopia. Climate 2021, 9, 98,
doi:10.3390/cli9060098.
23. Tekle, A. Assessment of Climate Change Impact on Water Availability of Bilate Watershed, Ethiopian Rift
Valley Basin. In Proceedings of the AFRICON 2015; September 2015; pp. 1–5.
24. Tsegay Wolde-Georgis; Aweke, D.; Hagos, Y. The Case of Ethiopia Reducing the Impacts of Environmental
Emergencies through Early Warning and Preparedness: The Case of the 1997–98 El Niño. National
Meteorological Service Agency (NMSA): Addis Ababa, Ethiopia, 2000, 1-73..
25. Legese, W.; Koricha, D.; Ture, K. Characteristics of Seasonal Rainfall and Its Distribution Over Bale
Highland, Southeastern Ethiopia. J Earth Sci Clim Change 2018, 09, doi:10.4172/2157-7617.1000443.
26. Czech Geological Survey and Geological Survey of Ethiopia. Explanatory notes to the thematic geoscientific
maps of Ethiopia at a scale of 1 : 50,000. 2018. Available online: http://www.geology.cz/etiopie-
2018/outputs/dila/explanatory-notes-0638-c2-dila.pdf
27. Muluneh, M. Web-Based Decision Support Systems for Managing Water Resources of Abaya Chamo Basin
Project; 2018.
28. Alaska Satelite Facility. Available online: https://asf.alaska.edu/ (Assessed on August 1, 2022)
29. QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation Project,
2022. http://qgis.osgeo.org
30. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical
Computing, 2022, Vienna, Austria. URL https://www.R-project.org/.
31. U.S. Geological Survey. USGS EROS Archive - Digital Elevation - Shuttle Radar Topography Mission
(SRTM) 1 Arc-Second Global. Available online: https://www.usgs.gov/centers/eros/science/usgs-eros-
archive-digital-elevation-shuttle-radar-topography-mission-srtm-1#overview (assessed on August 1,
2022).
32. Food and Agriculture Organization of the United Nations. Harmonized world soil database. Available online:
https://www.fao.org/soils-portal/data-hub/soil-maps-and-databases/harmonized-world-soil-database-
v12/en/ (accessed on November 16, 2022)
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 September 2023 doi:10.20944/preprints202309.1165.v1
19
33. Huffman, G.J.; Stocker, E.F.; Bolvin, D.T.; Nelkin, E.J.; Tan, J. GPM IMERG Final Precipitation L3 1 month
0.1 degree x 0.1 degree V06. Goddard Earth Sciences Data and Information Services Center (GES DISC).
Greenbelt, MD, 2019. Available online: https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGM_06/summary
(accessed on July 31, 2022)
34. Amy McNally NASA/GSFC/HSL. FLDAS Noah Land Surface Model L4 Global Monthly 0.1 x 0.1 degree
(MERRA-2 and CHIRPS). Goddard Earth Sciences Data and Information Services Center (GES DISC).
Greenbelt, MD, USA, 2018. Available online:
https://disc.gsfc.nasa.gov/datasets/FLDAS_NOAH01_C_GL_M_001/summary (accessed on July 31, 2022)
35. Wan, Z.; Hook, S.; Hulley, G. MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global 1km
SIN Grid V006. NASA EOSDIS Land Processes DAAC 2015. Available online:
https://doi.org/10.5067/MODIS/MOD11A1.006 (accessed on July 31, 2022)
36. Didan, K. MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V006. NASA EOSDIS Land
Processes DAAC 2015. Available online: https://doi.org/10.5067/MODIS/MOD13Q1.00636. (accessed on
August 1, 2022)
37. Hastie, T.; Tibshirani, R.; Friedman, J. H.; Friedman, J. H. The elements of statistical learning: data mining,
inference, and prediction, New York: springer, 2009. Vol. 2, pp. 1-758.
38. Greitzer, F. L.; Li, W.; Laskey, K. B.; Lee, J.; Purl, J. Experimental investigation of technical and human
factors related to phishing susceptibility. ACM Transactions on Social Computing, 2021, 4(2), 1-48.
39. Tang, L.; Mahmoud, Q. H. A survey of machine learning-based solutions for phishing website detection.
Machine Learning and Knowledge Extraction, 2021, 3(3), 672-694.
40. Zhou, W. Condition State-Based Decision Making in Evolving Systems: Applications in Asset Management
and Delivery, Doctoral dissertation, George Mason University, Fairfax, VA, 2023.
41. Zantalis, F.; Koulouras, G.; Karabetsos, S.; Kandris, D. A review of machine learning and IoT in smart
transportation. Future Internet, 2019, 11(4), 94.
42. Harvey, A.; Laskey, K.; Chang, K. C. Machine learning applications for sensor tasking with non-linear
filtering. Sensors, 2022, 22(6), 2229.
43. Fan, Z. Models and Algorithms for Data-Driven Scheduling, Doctoral dissertation, George Mason
University, Fairfax, VA 2023.
44. Fan, Z.; Chang, K. C.; Raz, A. K.; Harvey, A.; Chen, G. Sensor Tasking for Space Situation Awareness:
Combining Reinforcement Learning and Causality. In 2023 IEEE Aerospace Conference, 2023, pp. 1-9.
45. Freedman, D. A. Statistical models: theory and practice. Cambridge university press, 2009.
46. Friedman, J. H. Multivariate adaptive regression splines. The annals of statistics 1991, 19(1), 1-67.
47. Murphy, K. P. Machine learning: a probabilistic perspective. MIT press, 2012.
48. Breiman, L. Random forests. Machine learning 2001, 45(1), 5-32.
49. Liaw, A.; Wiener, M. Classification and regression by randomForest. R news 2022, 2(3), 18-22.
50. Zhang, G.; Lu, Y. Bias-corrected random forests in regression. Journal of Applied Statistics 2012, 39(1), 151-
160.
51. Malhotra, S.; Karanicolas, J. A Numerical Transform of Random Forest Regressors corrects Systematically-
Biased Predictions. 2020, arXiv preprint arXiv:2003.07445.
52. Ross, B.C. Mutual information between discrete and continuous data sets. PLoS ONE 2014, 9, e87357.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those
of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s)
disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or
products referred to in the content.