0% found this document useful (0 votes)
37 views12 pages

Vaex in Water Quality Analysis

The document discusses the application of soft computing and machine learning techniques for water quality prognostication in India, highlighting the health burden of poor water quality and the reliance on untreated groundwater. It reviews existing literature, identifies research gaps, and presents a methodology involving data preprocessing and machine learning algorithms to improve water quality prediction accuracy. The results indicate varying accuracy levels among different machine learning models, with TABNET achieving the highest accuracy of 87%.

Uploaded by

Mallika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views12 pages

Vaex in Water Quality Analysis

The document discusses the application of soft computing and machine learning techniques for water quality prognostication in India, highlighting the health burden of poor water quality and the reliance on untreated groundwater. It reviews existing literature, identifies research gaps, and presents a methodology involving data preprocessing and machine learning algorithms to improve water quality prediction accuracy. The results indicate varying accuracy levels among different machine learning models, with TABNET achieving the highest accuracy of 87%.

Uploaded by

Mallika Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

ICICCD-2024

Soft Computing and Machine Learning application


on Water Quality Prognostication

Presented by:
Author’s Paper_ID
Institute
Mallika
Logo
Prof Nanhay Singh 260
Netaji Subhas University Of Technology
INTRODUCTION

Motivation
• Enormous health burden of poor water quality in India. Around 37.7 million Indians are affected
by waterborne diseases
• Rural areas are solely dependent on groundwater, and water bodies and consume it without using
artificial purifiers. This lack of awareness increases the complexity of the nation to save them from
disease
• Urban areas now rely on costly purifiers and packaged water which is increasing the economic
burden of living life
• Industrial waste is discarded by wastewater. This untreated waste water contaminate the ground
water.
• In 2022- 2023 The Department of Drinking Water and Sanitation has been allocated Rs 77, 223
crore in Union Budget which is 29% increase
LITERATURE SURVEY & RESEARCH GAPS
Gaps
A review of the application of Application of Machine Focus on general water quality
machine learning in water Learning Models in Surface Parameters no analysis of
quality evaluation, Science water pollutants
Direct Journal Eco-
environment & Health 107- Models used
116 (2022) 1) ANN 2) SVM 3) LSTM

Exploring Artificial Intelligence Models used Focus on general water quality


Techniques for Groundwater 1) PSO 2) SVM Parameters no analysis of
Quality Assessment, Journal Accuracy 77% pollutants
Water, volume 19

Water quality assessment of a Models used high computational


river using deep learning Bi- 1) SVM 2) ANN 3) MLR complexity, leading to longer
LSTM methodology: processing times and
forecasting and validation potentially less accurate
results.
Water Quality Prediction Using KNN 1) Logistic Regression 2) SVC 3) DT 4) -Existing machine learning models show
Imputer and Multilayer Perceptron Random Forest 5) KNN 6) SGDC unsatisfactory performance in water quality
prediction.
- Current systems lack accuracy in classifying
water quality.
Automated systems for water quality
classification with improved accuracy are
needed.

Water-Quality Prediction Based on H2O H20 Ensemble Small dataset used of 935 instances
AutoML and Explainable AI Techniques

• ANN models improve accuracy but have shortcomings, in case of ambiguous input parameters ANN
struggles to establish non linear relationship

• ANFIS model (Adaptive neuro-fuzzy inference system ) integrates linear and non linear relationship but
it cannot form hidden relationships if correlation of parameters is weak
METHODOLOGY
• Incorporation of two datasets- Real-time
dataset (Published on Government site,
Number of entries- 3000)
Kaggle Dataset(59, 34,000), Data released
by Intel
• Data pre-processing- Identification of data
imputation techniques, Conversion of a large
dataset to a manageable small-sized dataset
• Application of Vanilla Machine Learning
algorithm
DATASET INFORMATION
Dataset –Intel
The dataset consists of 59,56,842 data points out of
which 41,51,590 (69%) have been classified as fit for
drinking and 18,05,252 (31%) are unfit

• Parameters- pH level, Iron, Nitrate, Chloride, Lead,


Zinc, Fluoride, Copper Sulfate, Chlorine and Manganese
contents, Turbidity, Odor, Conductivity, Total dissolved
Solids, Source, Water and air temperature along with
month, date time. Categorical features include colors,
Sources, Months, Day and Time
PREPROCESSING
Size Reduction
VAEX LIBRARY-
Vaex is a Python library designed for efficient processing, analysis, and visualization of large datasets. It focuses on reducing
memory usage and speeding up computations, especially for datasets that are too large to fit into memory.
Vaex achieves this through lazy evaluation and out-of-core computing, meaning that it operates on chunks of data at a time rather
than loading the entire dataset into memory. We converted the csv file to hdf5 format (for Kaggle dataset)

Missing Value
Kernel Density Estimate Plots –
Since, The curve is Gaussian like, Z score
can be used for outlier detection and
mean imputation

Z Score Calculation and Mean Imputation


DATA PRE-PROCESSING

• Correlation Identification
1) T test – The Null hypothesis that
there is no mean difference in
parameters of group 1- Fit drinking
water and group2 – Unfit drinking
water is failed to be rejected
2) Heat Map Results

• Water Quality Index Calculation


WQI is calculated based on Biological Oxygen
Demand, Total Suspended Solids, Dissolved
Oxygen, pH, Conductivity, Total Caliform,
Fecal Caliform, and Nitrate. For computing
WQI following formula is used
Machine Learning Algorithms Results
ML Algorithm Accuracy
Logistic Regression 79%
Decision Tree Classification 83%
XG Boost 86%
Multi Layer Perceptron 84%
TABNET 87%
REFERENCES

[1] Mengyuan Zhu, Jiawei Wang, Xiao Yang, Yu Zhang, Linyu Zhang, Hongqiang Ren, Bing Wu * , Lin Ye * - A review of the
application of machine learning in water quality evaluation, Science Direct Journal Eco-environment & Health 107-116 (2022)

[2] Rafiul Alam, Zia Ahmed , Sirajum Munir Seefat, Khadiza Tul Kobra Nahin - Assessment of surface water quality around landfill
using multistatistical method, Sylhet, Bangladesh, Science Direct Journal Environment Nanotechnology, Monitoring & Management,
100422 (2021)

[3] Y. Wu, X. Zhang, Y. Xiao, J. Feng - Attention neural network for water image classification under IoT environment, Appl. Sci.,
030909, (2020)

[4.]Jun Pan, Te Leng, Yang Liu - Shifosi Reservoir Water Environmental Assessment Based on Grey Clustering, Advanced Materials
Research, 610-613 (2013)

.
REFERENCES

[5] Purushottam Agrawal, Alok Sinha, Satish Kumar, Ankit Agarwal, Ashes Banerjee, Vasanta Govind Kumar Villuri,
Chandra Sekhara Rao Annavarapu, Rajesh Dwivedi, Vijaya Vardhan Reddy Dera, Jitendra Sinha, and Srinivas Pasupulet -
Exploring Artificial Intelligence Techniques for Groundwater Quality Assessment, Journal Water, volume 19 (2021)

[6] Mehrdad Jeihouni, Ara Toomanian, & Ali Mansourian - Decision Tree-Based Data Mining and Rule Induction for
Identifying High-Quality Groundwater Zones for Water Supply Management: a Novel Hybrid Use of Data Mining and GIS,
Water Resource Management, Springer Article 34, 139-154 (2020)

[7] Suraj Kumar Bhagat, Tiyasha Tiyasha, Salih Muhammad Awadh, Tran Minh Tung, Ali H. Jawad, Zaher Mundher Yaseen
- Prediction of sediment heavy metal at Australian bays using newly developed hybrid AI models

[8] Sakshi Kullar, Nanhey Singh - Water quality assessment of a river using deep learning BiLSTM methodology:
forecasting and validation, Springer Journal Environmental Science Pollution and Research, Water Quality, 12875-12899
(2022)
[9] Zunyang Zhang , , Cheng Yang , Qiao Qiao , Xuesheng Li , Fuping Wang and Chengcheng Li Application of Improved
Particle Swarm Optimization SVM in Water Quality Evaluation of Ming Cui Lake
Thankyou

You might also like