Co-author by Tran Van Phong

The sustainability of water resource management remains challenging in many regions around the wo... more The sustainability of water resource management remains challenging in many regions around the world. Yet while the significance of groundwater potential maps in water resource management is well known, no agreed-upon approach has been suggested for the production of reliable, accurate maps of groundwater potential. In this study, we evaluated the Partial Decision Tree (PART), Fuzzy Unordered Rule Induction Algorithm (FURIA), Multilayer Perception Network (MLP), Forest by Penalizing Attributes (FPA), and an ensemble version of the FPA method with the Decorate ensemble learning techniques (DFPA) for their capability to explore the associations between the locations of groundwater wells and a set of geo-environmental variables for the prediction of the potential for groundwater occurrence. We applied the methods to a spatially explicit dataset from five provinces of the Central Highlands, Vietnam. The results revealed that rainfall, land use/cover, elevation, and river density contributed most to groundwater potential in the study area. The ensemble model, i.e., DFPA, achieved greater goodness-of-fit and predictive ability than the single models. The ensemble DFPA model with accuracy = 70%, ROC-AUC = 0.77, RMSE = 0.44 provided the most accurate prediction of groundwater potential in the study area, followed by the FPA (ROC-AUC = 0.76), PART (ROC-AUC = 0.72), FURIA (ROC-AUC = 0.7), and MLP (ROC-AUC = 0.69) models, respectively. The ensemble DFPA model classified 34.7, 44.1, and 21.2% of the Central Highlands into low, moderate, and high potential categories, respectively. We experimentally showed that ensemble modeling is promising as a supporting tool in helping decision-makers, stakeholders, and researchers promote strategies for sustainable water resources management.

Landslides are one of the most devastating natural hazards causing huge loss of life and damage t... more Landslides are one of the most devastating natural hazards causing huge loss of life and damage to properties and infrastructures and adversely affecting the socioeconomy of the country. Landslides occur in hilly and mountainous areas all over the world. Single, ensemble, and hybrid machine learning (ML) models have been used in landslide studies for better landslide susceptibility mapping and risk management. In the present study, we have used three single ML models, namely, linear discriminant analysis (LDA), logistic regression (LR), and radial basis function network (RBFN), for landslide susceptibility mapping at Pithoragarh district, as these models are easy to apply and so far they have not been used for landslide study in this area. e main objective of this study is to evaluate the performance of these single models for correctly identifying landslide susceptible zones for their further application in other areas. For this, ten important landslide affecting factors, namely, slope, aspect, curvature, elevation, land cover, lithology, geomorphology, distance to rivers, distance to roads, and overburden depth based on the local geoenvironmental conditions, were considered for the modeling. Landslide inventory of past 398 landslide events was used in the development of models. e data of past landslide events (locations) was randomly divided into a 70/30 ratio for training (70%) and validation (30%) of the models. Standard statistical measures, namely, accuracy (ACC), specificity (SPF), sensitivity (SST), positive predictive value (PPV), negative predictive value (NPV), Kappa, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC), were used to evaluate the performance of the models. Results indicated that the performance of all the models is very good (AUC > 0.90) and that of the LR model is the best (AUC � 0.926). erefore, these single ML models can be used for the development of accurate landslide susceptibility maps. Our study demonstrated that the single models which are easy to use and can compete with the complex ensemble/hybrid models can be applied for landslide susceptibility mapping in landslide-prone areas.

e main objective of the study was to investigate performance of three soft computing models: Naïv... more e main objective of the study was to investigate performance of three soft computing models: Naïve Bayes (NB), Multilayer Perceptron (MLP) neural network classifier, and Alternating Decision Tree (ADT) in landslide susceptibility mapping of Pithoragarh District of Uttarakhand State, India. For this purpose, data of 91 past landslide locations and ten landslide influencing factors, namely, slope degree, curvature, aspect, land cover, slope forming materials (SFM), elevation, distance to rivers, geomorphology, overburden depth, and distance to roads were considered in the models study. ematic maps of the Geological Survey of India (GSI), Google Earth images, and Aster Digital Elevation Model (DEM) were used for the development of landslide susceptibility maps in the Geographic Information System (GIS) environment. Landslide locations data was divided into a 70 : 30 ratio for the training (70%) and testing/validation (30%) of the three models. Standard statistical measures, namely, Positive Predicted Values (PPV), Negative Predicted Values (NPV), Sensitivity, Specificity, Mean Absolute Error (MAE), Root Mean Squire Error (RMSE), and Area under the ROC Curve (AUC) were used for the evaluation of the models. All the three soft computing models used in this study have shown good performance in the accurate development of landslide susceptibility maps, but performance of the ADT and MLP is better than NB. erefore, these models can be used for the construction of accurate landslide susceptibility maps in other landslide-prone areas also.

Recently, floods are occurring more frequently every year around the world due to increased anthr... more Recently, floods are occurring more frequently every year around the world due to increased anthropogenic activities and climate change. There is a need to develop accurate models for flood susceptibility prediction and mapping, which can be helpful in developing more efficient flood management plans. In this study, the Partial Decision Tree (PART) classifier and the AdaBoost, Bagging, Dagging, and Random Subspace ensembles learning techniques were combined to develop novel GIS-based ensemble computational models (ABPART, BPART, DPART and RSSPART) for flood susceptibility mapping in the Quang Binh Province, Vietnam. In total, 351 flood locations were used in the model study. This data was divided into a 70:30 ratio for model training (70% ≅ 255 locations) and (30% ≅ 96 locations) for model validation. Ten flood influencing factors, namely elevation, slope, curvature, flow direction, flow accumulation, river density, distance from river, rainfall, land-use, and geology, were used for the development of models. The OneR feature selection method was used to select and prioritize important factors for the spatial modeling. The results revealed that land-use, geology, and slope are the most important conditioning factors in the occurrence of floods in the study area. Standard statistical methods, including the ROC curve (AUC), were used for the performance evaluation of models. Results indicated that the performance of all models was good (AUC > 0.9) and RSSPART (AUC = 0.959) outperformed the others. Thus, the RSSPART model can be used for accurately predicting and mapping flood susceptibility.

Vietnam's central coastal region is the most vulnerable and always at flood risk, severely affect... more Vietnam's central coastal region is the most vulnerable and always at flood risk, severely affecting people's livelihoods and socioeconomic development. In particular, Quang Binh province is often affected by floods and storms over the year. However, it still lacks studies on flood hazard estimation and prediction tools in this area. This study aims to develop a flooding susceptibility assessment tool using various machine learning (ML) techniques namely alternating decision tree (AD Tree), logistic model tree (LM Tree), reduced-error pruning tree (REP Tree), J48 decision tree (J48) and Naïve Bayes tree (NB Tree); historical flood marks; and available data of topography, hydrology, geology, and environment considering Quang Binh province as a study area. We used flood mark locations of major flooding events in the years 2007, 2010, and 2016; and ten flood conditioning factors to construct and validate the ML models. Various validation methods, including area under the ROC curve (AUC), were used to validate and compare the models. The result of the models' validation suggests that all models have good performance: AD Tree (AUC = 0.968), LM Tree (AUC = 0.967), REP Tree (AUC = 0.897), J48 (AUC = 0.953), and NB Tree (AUC = 0.986). Out of these, NB Tree managed to achieve the best performance in terms of flood prediction with an accuracy higher than 92 %. The final flood susceptibility map highlights 6,265 km 2 (78.8 % area) with a very low flooding hazard, 391 km 2 (4.9 % area) with a low flooding hazard, 224 km 2 (2.8 % area) with a moderate flooding hazard, 243 km 2 (3.1 %) with a high flooding hazard, and 829 km 2 (10.4 % area) with very high flooding hazard. The final flooding susceptibility assessment map could add a valuable source for flood risk reduction and management activities of Quang Binh province.

Groundwater potential maps are important tools for the sustainable management of water resources,... more Groundwater potential maps are important tools for the sustainable management of water resources, especially in agricultural producing countries like Vietnam. Here, we describe the development and application of a spatially explicit ensemble modeling framework that allows for analyzing spatially explicit data for estimating groundwater potential across the Kon Tum Province, Vietnam. Based on this framework, the Naïve Bayes (NB) method was integrated with the Bagging (B), AdaBoost (AB), and Rotation Forest (RF) ensemble learning techniques to develop three ensemble models, namely BNB, ABNB, and RFNB. A suite of well yield data and thirteen explanatory variables (i.e., elevation, aspect, slope, curvature, river density, topographic wetness index, sediment transport index, soil type, geology, land use, rainfall, and flow direction and accumulation) were incorporated into the modeling processes over the independent training and validation levels of the single NB model and its three ensembles. Several performance metrics (i.e., area under the receiver operating characteristic curve (AUC), root mean square error (RMSE), accuracy, sensitivity, specificity, negative predictive value, and positive predictive value) demonstrated that the three ensemble models successfully surpassed the single NB model in groundwater potential mapping. The ensemble RFNB model with AUC = 0.849, accuracy = 83.33%, sensitivity = 100%, specificity = 75%, and RMSE = 0.406 exhibited the most accurate performance for mapping groundwater potential in the Kon Tum Province, followed by the ABNB (AUC = 0.844), BNB (AUC = 0.815), and single NB (AUC = 0.786) models, respectively. Further, the correlation based feature selection method identified elevation, slope, land use, rainfall, and STI as the most useful explanatory variables for explaining the distribution of groundwater potential in the Kon Tum Province. The methodology proposed in this case study and the produced potential maps enable managers to align water use patterns with the shared benefits and costs of different users and to develop strategies for sustainable groundwater exploitation, preservation, and management.

Fire is among the most dangerous and devastating natural hazards in forest ecosystems around the ... more Fire is among the most dangerous and devastating natural hazards in forest ecosystems around the world. The development of computational ensemble models for improving the predictive accuracy of forest fire susceptibilities could save time and cost in firefighting efforts. Here, we combined a locally weighted learning (LWL) algorithm with the Cascade Generalization (CG), Bagging, Decorate, and Dagging ensemble learning techniques for the prediction of forest fire susceptibility in the Pu Mat National Park, Nghe An Province, Vietnam. A geospatial database that contained records from 56 historical fires and nine J o u r n a l P r e -p r o o f Journal Pre-proof 2 explanatory variables was employed to train the standalone LWL model and its derived ensemble models. The models were validated for their goodness-of-fit and predictive capability using the area under the receiver operating characteristic curve (AUC) and several other statistical performance criteria. The CG-LWL and Bagging-LWL models with AUC = 0.993 showed the highest training performance, whereas the Dagging-LWL ensemble model with AUC = 0.983 performed better than Decorate-LWL (AUC = 0.976), CG-LWL and Bagging-LWL (AUC = 0.972), and LWL (AUC = 0.965) for predicting the spatial pattern of fire susceptibilities across the study area. Our study promotes the application of ensemble models in forest fire prediction and enhances the researchers' understanding of the processes of model building. Although these four ensemble models were originally developed for the estimation of forest fire susceptibility, the models are sufficiently general to be used for predicting other types of natural hazards, such as landslides, floods, and dust storms, by considering local geo-environmental factors.

This study propose a new approach through which the landslide susceptibility in Quang Nam 26 (Vie... more This study propose a new approach through which the landslide susceptibility in Quang Nam 26 (Vietnam) will be estimated using the best model among the following algorithms: Decision Ensemble. In this regard, a map with 1130 landslide, was created and further partitioned 31 into training (70%) and testing (30%) locations. The correlation-based features selections (CFS) 32 method was used to select a number of 15 landslide influencing factors. Landslide locations, 33 included in the training sample, and the landslide predictors were used as input data in order to run 34 the above mentioned models. Kappa index, Accuracy (%) and ROC curve were employed to 35 estimate the model's performance and to test the outcomes provided by the models. Among the 36 eleven machine learning algorithms, Random Sub Space Decision Table Naïve Bayes 37 (RSSDTNB) was the most performant model with an AUC = 0.839, Accuracy = 76.55% and 38 Kappa Index = 0.531. Therefore, this algorithm was involved in the estimation of landslide 39 susceptibility. The Success Rate (AUC = 0.815) and Prediction Rate (AUC = 0.826) revealed the 40 achievement of high-quality results.

In this paper, we proposed a novel approach for flood risk assessment, which is a combination of ... more In this paper, we proposed a novel approach for flood risk assessment, which is a combination of a deep learning algorithm and Multi-Criteria Decision Analysis (MCDA). The framework of the flood risk assessment involves three main elements: hazard, exposure, and vulnerability. For this purpose, one of the flood-prone areas of Vietnam, namely Quang Nam province was selected as the study area. Data of 847 past flood locations of this area was analyzed to generate training and testing datasets for the models. In this study, we have used one of the popular Deep Neural Networks (DNNs) algorithm for generation of flood susceptibility map while Analytic Hierarchy Process (AHP), which is a popular MCDA approach, was used to generate the hazard, exposure, and vulnerability maps. We have also used hybrid models namely BFPA and DFPA which are the ensembles of Bagging and Decorate with Forest by Penalizing Attributes algorithm for the comparison of performance with DNNs method. Various standard statistical indices including Receiver Operating Characteristic (ROC) curves were used for the performance evaluation and validation of the models. Results indicated that integration of DNNs and MCDA models is a promising approach for developing accurate flood risk assessment map of an area for the better flood hazard management.

The groundwater potential map is an important tool for a sustainable water management and land us... more The groundwater potential map is an important tool for a sustainable water management and land use planning, particularly for agricultural countries like Vietnam. In this article, we proposed new machine learning ensemble techniques namely AdaBoost ensemble (ABLWL), Bagging ensemble (BLWL), Multi Boost ensemble (MBLWL), Rotation Forest ensemble (RFLWL) with Locally Weighted Learning (LWL) algorithm as a base classifier to build the groundwater potential map of Gia Lai province in Vietnam. For this study, eleven conditioning factors (aspect, altitude, curvature, slope, Stream Transport Index (STI), Topographic Wetness Index (TWI), soil, geology, river density, rainfall, land-use) and 134 wells yield data was used to create training (70%) and testing (30%) datasets for the development and validation of the models. Several statistical indices were used namely Positive Predictive Value (PPV), Negative Predictive Value (NPV), Sensitivity (SST), Specificity (SPF), Accuracy (ACC), Kappa, and Receiver Operating Characteristics (ROC) curve to validate and compare performance of models. Results show that performance of all the models is good to very good (AUC:

Groundwater is one of the major valuable water resources for the use of communities, agriculture,... more Groundwater is one of the major valuable water resources for the use of communities, agriculture, and industries. In the present study, we have developed three novel hybrid artificial intelligence (AI) models which is a combination of modified RealAdaBoost (MRAB), bagging (BA), and rotation forest (RF) ensembles with functional tree (FT) base classifier for the groundwater potential mapping (GPM) in the basaltic terrain at DakLak province, Highland Centre, Vietnam. Based on the literature survey, these proposed hybrid AI models are new and have not been used in the GPM of an area. Geospatial techniques were used and geo-hydrological data of 130 groundwater wells and 12 topographical and geo-environmental factors were used in the model studies. One-R Attribute Evaluation feature selection method was used for the selection of relevant input parameters for the development of AI models. The performance of these models was evaluated using various statistical measures including area under the receiver operation curve (AUC). Results indicated that though all the hybrid models developed in this study enhanced the goodness-of-fit and prediction accuracy, but MRAB-FT (AUC = 0.742) model outperformed RF-FT (AUC = 0.736), BA-FT (AUC = 0.714), and single FT (AUC = 0.674) models. Therefore, the MRAB-FT model can be considered as a promising AI hybrid technique for the accurate GPM. Accurate mapping of the groundwater potential zones will help in adequately recharging the aquifer for optimum use of groundwater resources by maintaining the balance between consumption and exploitation.

In this study, we have investigated rainfall induced landslide susceptibility of the Uttarkashi d... more In this study, we have investigated rainfall induced landslide susceptibility of the Uttarkashi district of India through the developmentof different novel GIS based soft computing approaches namely Bagging-MLPC, Dagging-MLPC, Decorate-MLPC which are a combination Multi-layer Perceptron Neural Network Classifier (MLPC) and Bagging, Dagging, and Decorate ensemble methods, respectively. The proposed models were trained and validated with the help of 103 historical landslide events (divided into 2 samples: training (70%) and validation (30%)) and 12 landslide conditioning factors. The accuracy of the models was evaluated using different statistical methods including Area Under Curve (AUC) of Receiver Operating Characteristic (ROC). The results show that though performance of all the studied models is good (AUC> 0.80) but of the hybrid Bagging-MLPC model is the best (AUC:0.965). Therefore, this newly hybrid model (Bagging-MLPC) can be used for the accurate landslide susceptibility mapping and assessment of landslide prone areas for landslide prevention and management.
The main objectives of this research are to provide a new approach for flash flood prediction in ... more The main objectives of this research are to provide a new approach for flash flood prediction in Lao Cai, where frequent typhoons happen. This method is based on the Random Forest classification algorithm. The researcher applied GIS database in combination with construction machine learning model and verified the forecasting model, extracted the data based on field survey of the flash flood area of Lao Cai and GIS (Geographic Information System). The results have proved that the model can be a useful tool for flash flood forecasting model, providing more data for land planning and management for preventing and predicting flash flood for Lao Cai area.

Improving the accuracy of flood prediction and mapping is crucial for reducing damage resulting f... more Improving the accuracy of flood prediction and mapping is crucial for reducing damage resulting from flood events. In this study, we proposed and validated three ensemble models based on the Best First Decision Tree (BFT) and the Bagging (Bagging-BFT), Decorate (Bagging-BFT), and Random Subspace (RSS-BFT) ensemble learning techniques for an improved prediction of flood susceptibility in a spatially-explicit manner. A total number of 126 historical flood events from the Nghe An Province (Vietnam) were connected to a set of 10 flood influencing factors (slope, elevation, aspect, curvature, river density, distance from rivers, flow direction, geology, soil, and land use) for generating the training and validation datasets. The models were validated via several performance metrics that demonstrated the capability of all three ensemble models in elucidating the underlying pattern of flood occurrences within the research area and predicting the probability of future flood events. Based on the Area Under the receiver operating characteristic Curve (AUC), the ensemble Decorate-BFT model that achieved an AUC value of 0.989 was identified as the superior model over the RSS-BFT (AUC = 0.982) and Bagging-BFT (AUC = 0.967) models. A comparison between the performance of the models and the models previously reported in the literature confirmed that our ensemble models provided a reliable estimate of flood susceptibilities and their resulting susceptibility maps are trustful for flood early warning systems as well as development of mit-igation plans.

This paper introduces a new deep-learning algorithm of deep belief network (DBN) based on an extr... more This paper introduces a new deep-learning algorithm of deep belief network (DBN) based on an extreme learning machine (ELM) that is structured by back propagation (BN) and optimized by particle swarm optimization (PSO) algorithm, named DEBP, for flood susceptibility mapping in the Vu Gia-Thu Bon watershed, central Vietnam. We use 847 locations of floods that occurred in 2007, 2009, and 2013 and 16 flood conditioning factors evaluated by an information gain ratio (IGR) technique to construct and validate the proposed model. Statistical metrics, including sensitivity, specificity, accuracy, F1-measure, Jaccard coefficient, Matthews correlation coefficient (MCC), root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC), are used to assess the goodness-of-fit/performance and prediction accuracy of the new deep learning model. We further compare the proposed model with several well-known machine learning algorithms, including artificial neural network-based radial base function (ANNRBF), logistic regression (LR), logistic model tree (LMTree), functional tree (FTree), and alternating decision tree (ADTree). The new proposed model, DEBP, has the highest goodness-of-fit (AUC = 0.970) and prediction accuracy (AUC = 0.967) of all of the tested models and thus shows promise as a tool for flood susceptibility modeling. We conclude that novel deep learning algorithms such as the one used in this study can improve the accuracy of flood susceptibility maps that are required by planners, decision makers, and government agencies to manage of areas vulnerable to flood-induced damage.

Flood risk assessment is an important task for disaster management activities in flood-prone area... more Flood risk assessment is an important task for disaster management activities in flood-prone areas. Therefore, it is crucial to develop accurate flood risk assessment maps. In this study, we proposed a flood risk assessment framework which combines flood susceptibility assessment and flood consequences (human health and financial impact) for developing a final flood risk assessment map using Multi-Criteria Decision Analysis (MCDA) method. Two hybrid Artificial Intelligence (AI) models, namely ABMDT (AdaBoost-DT) and BDT (Bagging-DT) were developed with Decision Table (DT) as a base classifier for creating a flood susceptibility map. We used 847 flood locations of major flooding events in the years 2007, 2009 and 2013 in Quang Nam province of Vietnam; and 14 flood influencing factors of topography, geology, hydrology and environment to construct and validate the hybrid AI models. Various statistical measures were used to validate the models, including the Area Under Receiver Operating Characteristic (ROC) Curve called AUC. Results show that all the proposed models performed well, but the performance of the BDT model (AUC = 0.96) is the best in comparison to other models ABMDT (AUC = 0.953) and single DT (AUC = 0.929). Therefore, the flood susceptibility map produced by the BDT model was used to combine with a flood consequences map to develop a reliable flood risk assessment map for the study area. The final flood risk map can provide a useful source for better flood hazard management of the study area, and the proposed framework and models can be applied to other flood-prone areas.
In this study, we have investigated rainfall induced landslide susceptibility of the Uttarkashi d... more In this study, we have investigated rainfall induced landslide susceptibility of the Uttarkashi district of India through the developmentof different novel GIS based soft computing approaches namely Bagging-MLPC, Dagging-MLPC, Decorate-MLPC which are a combination Multi-layer Perceptron Neural Network Classifier (MLPC) and Bagging, Dagging, and Decorate ensemble methods, respectively. The proposed models were trained and validated with the help of 103 historical landslide events (divided into 2 samples: training (70%) and validation (30%)) and 12 landslide conditioning factors. The accuracy of the models was evaluated using different statistical methods including Area Under Curve (AUC) of Receiver Operating Characteristic (ROC). The results show that though performance of all the studied models is good (AUC> 0.80) but of the hybrid Bagging-MLPC model is the best (AUC:0.965).
Uploads
Co-author by Tran Van Phong