0% found this document useful (0 votes)

29 views14 pages

Lab 1 - Python - Excel

Uploaded by

tfs23112003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views14 pages

Lab 1 - Python - Excel

Uploaded by

tfs23112003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

TRẦN THỊ LAN ANH

Lớp:CNTTK20P

MãSV: DTC21H4802010537

Access to Safe drinking water is eassently a global issue. The World Health Organization
(WHO) estimates that half of all people in the world are affected by the lack of safe drinking
water. With this assesment, we will explore the data and look for patterns in the data to analyze if
the given data is a good indicator of safe drinking water.

In [20]:
# Import all required libraries
import pandas as pd
import numpy as np
import [Link] as plt
import seaborn as sns
from scipy import stats
[Link]('ignore', category=[Link])
#sns.set_context('notebook')
Data Set

The dataset is downloaded from [Link] and is available for download at:

[Link]
EDA - Exploratory Data Analysis

In this section we will explore the data and look for patterns in the data to analyze if the given
data is a good indicator of safe drinking water.

1) Describe the data

2) Visualize the data

3) Identify the missing values and fill them

4) Identify the outliers and remove them

5) Identify the categorical variables and encode them (if any)

6) Identify the numerical variables and perform basic statistical analysis

In [2]:
#2 Đường dẫn tới file CSV
df = pd.read_csv(r'G:/python excel/water_potability.csv')
In [3]:
# Read the csv file from the url

In [4]:
# Print the first 5 rows of the dataframe
display([Link]())

Chlor Trihalo Tur Pota

Hard Sulfa Condu Organic
ph Solids amine methane bidit bilit
ness te ctivity _carbon
s s y y

204.8 20791. 7.3002 368.5 564.30 10.3797 86.99097 2.96

0 NaN 0
90455 318981 12 16441 8654 83 0 3135

3.71
129.4 18630. 6.6352 592.88 15.1800 56.32907 4.50
1 608 NaN 0
22921 057858 46 5359 13 6 0656
0

8.09
224.2 19909. 9.2758 418.60 16.8686 66.42009 3.05
2 912 NaN 0
36259 541732 84 6213 37 3 5934
4

8.31
214.3 22018. 8.0593 356.8 363.26 18.4365 100.3416 4.62
3 676 0
73394 417441 32 86136 6516 24 74 8771
6

9.09
181.1 17978. 6.5466 310.1 398.41 11.5582 31.99799 4.07
4 222 0
01509 986339 00 35738 0813 79 3 5075
3

More information about the data

ph - PH is an important parameter in evaluating the acid–base balance of water. It is also the

indicator of acidic or alkaline condition of water status. WHO has recommended maximum
permissible limit of pH from 6.5 to 8.5. The current investigation ranges were 6.52–6.83 which
are in the range of WHO standards.

Hardness - Hardness is a measure of the physical properties of the water. It is a measure of the
ability of the water to support the roots and the leaves. The lower the hardness, the more support
the roots and leaves can have.
Solids (Total dissolved solids - TDS) - TDS is a measure of the solids in the water. The water
with high TDS value indicates that water is highly mineralized. Desirable limit for TDS is 500
mg/l and maximum limit is 1000 mg/l which prescribed for drinking purpose.

Chloramines - Chlorine and chloramine are the major disinfectants used in public water systems.
Chloramines are most commonly formed when ammonia is added to chlorine to treat drinking
water. Chlorine levels up to 4 milligrams per liter (mg/L or 4 parts per million (ppm)) are
considered safe in drinking water.

Sulfate - Sulfate is a common disinfectant used in public water systems. Sulfate levels up to 2
milligrams per liter (mg/L or 2 parts per million (ppm)) are considered safe in drinking water.

Conductivity - Pure water is not a good conductor of electric current rather’s a good insulator.
Increase in ions concentration enhances the electrical conductivity of water. Generally, the
amount of dissolved solids in water determines the electrical conductivity. Electrical
conductivity (EC) actually measures the ionic process of a solution that enables it to transmit
current. According to WHO standards, EC value should not exceeded 400 μS/cm.

Organic_carbon - Total Organic Carbon (TOC) in source waters comes from decaying natural
organic matter (NOM) as well as synthetic sources. TOC is a measure of the total amount of
carbon in organic compounds in pure water. According to US EPA < 2 mg/L as TOC in treated /
drinking water, and < 4 mg/Lit in source water which is use for treatment.

Trihalomethanes - THMs are chemicals which may be found in water treated with chlorine. The
concentration of THMs in drinking water varies according to the level of organic material in the
water, the amount of chlorine required to treat the water, and the temperature of the water that is
being treated. THM levels up to 80 ppm is considered safe in drinking water.

Turbidity - Turbidity is a measure of the water’s ability to absorb particulate matter. The lower
the turbidity, the more it can absorb particulate matter.

Potability (Target variable) - Indicates if water is safe for human consumption where 1 means
Potable and 0 means Not potable.

In [5]:
# datatypes of the columns
print([Link]())
ph float64
Hardness float64
Solids float64
Chloramines float64
Sulfate float64
Conductivity float64
Organic_carbon float64
Trihalomethanes float64
Turbidity float64
Potability int64
dtype: object

In [6]:
# Describe the data
print([Link]())
Out[6]:

Chlor Cond Organi Trihalo

Hard Sulfa Turb Pota
ph Solids amin uctivi c_carb methan
ness te idity bility
es ty on es

co 2785. 3276. 3276. 3276. 2495. 3276. 3276. 3276.

3276.00 3114.00
un 0000 0000 00000 00000 0000 00000 0000 0000
0000 0000
t 00 00 0 0 00 0 00 00

m 196.3 22014 333.7

7.080 7.122 426.2 14.2849 66.3962 3.966 0.390
ea 6949 .0925 7577
795 277 05111 70 93 786 110
n 6 26 7

8768.
st 1.594 32.87 1.583 41.41 80.82 3.30816 16.1750 0.780 0.487
57082
d 320 9761 085 6840 4064 2 08 382 849
8

129.0
mi 0.000 47.43 320.9 0.352 181.4 2.20000 0.73800 1.450 0.000
0000
n 000 2000 42611 000 83754 0 0 000 000
0

176.8 15666 307.6

25 6.093 6.127 365.7 12.0658 55.8445 3.439 0.000
5053 .6902 9949
% 092 421 34414 01 36 711 000
8 97 8

196.9 20927 333.0

50 7.036 7.130 421.8 14.2183 66.6224 3.955 0.000
6762 .8336 7354
% 752 299 84968 38 85 028 000
7 07 6

216.6 27332 359.9

75 8.062 8.114 481.7 16.5576 77.3374 4.500 1.000
6745 .7621 5017
% 066 887 92304 52 73 320 000
6 27 0
Chlor Cond Organi Trihalo
Hard Sulfa Turb Pota
ph Solids amin uctivi c_carb methan
ness te idity bility
es ty on es

323.1 61227 481.0

m 14.00 13.12 753.3 28.3000 124.000 6.739 1.000
2400 .1960 3064
ax 0000 7000 42620 00 000 000 000
0 08 2

In [7]:
# Check if there are any null columns
print([Link]().sum())
Out[7]:
ph 491
Hardness 0
Solids 0
Chloramines 0
Sulfate 781
Conductivity 0
Organic_carbon 0
Trihalomethanes 162
Turbidity 0
Potability 0
dtype: int64

In [8]:
# Lets try to plot misisng values
[Link](figsize=(10, 6))
missing_values.plot(kind='bar', color='skyblue')
[Link]('Number of Missing Values in Each Column')
[Link]('Columns')
[Link]('Number of Missing Values')
[Link](rotation=45)
[Link]()

Analyze ph column

In [9]:
# for ph column
# set the histogram, mean and median
#9 # Tính toán trung bình và trung vị của cột 'ph'
mean_ph = df['ph'].mean()
median_ph = df['ph'].median()

# Hiển thị kết quả

print(f"Trung bình của cột 'ph': {mean_ph}")
print(f"Trung vị của cột 'ph': {median_ph}")

# Vẽ biểu đồ cột cho cột 'ph'

[Link](figsize=(10, 6))
[Link](df['ph'], kde=False, bins=20, color='skyblue', edgecolor='black')
[Link](mean_ph, color='r', linestyle='dashed', linewidth=1, label=f'Mean: {mean_ph:.2f}')
[Link](median_ph, color='g', linestyle='dashed', linewidth=1, label=f'Median:
{median_ph:.2f}')
[Link]('Distribution of pH')
[Link]('pH')
[Link]('Frequency')
[Link]()
[Link]()
Based on the above data, we can impute ph with either mean or median. There is no skweness in
the data.
Analyze Sulfate column

In [10]:
#10# Mô tả thống kê cơ bản của cột 'Sulphate'
sulfate_values = df['Sulfate'].dropna() # Drop NaN values for plotting

mean_sulfate = sulfate_values.mean()
median_sulfate = sulfate_values.median()

[Link](figsize=(10, 6))
[Link](sulfate_values, bins=30, kde=False, color='skyblue', edgecolor='black')

# Add mean and median lines

[Link](mean_sulfate, color='red', linestyle='dashed', linewidth=2, label=f'Mean:
{mean_sulfate:.2f}')
[Link](median_sulfate, color='green', linestyle='dashed', linewidth=2, label=f'Median:
{median_sulfate:.2f}')

[Link]('Histogram of Sulfate values')

[Link]('Sulfate')
[Link]('Frequency')
[Link]()
[Link]()
Based on the above data, we can impute Sulphate with either mean or median.

Analyze Trihalomethanes column

In [11]:
#11# Mô tả thống kê cơ bản của cột 'Trihalomethanes'
print("Thống kê cơ bản của cột 'Trihalomethanes':")
print(df['Trihalomethanes'].describe())

# Kiểm tra giá trị thiếu trong cột 'Trihalomethanes'

print("\nSố lượng giá trị thiếu trong cột 'Trihalomethanes':")
print(df['Trihalomethanes'].isnull().sum())

# Tính toán trung bình, trung vị, độ lệch chuẩn

mean_thm = df['Trihalomethanes'].mean()
median_thm = df['Trihalomethanes'].median()
std_thm = df['Trihalomethanes'].std()

# Hiển thị kết quả

print(f"\nTrung bình của cột 'Trihalomethanes': {mean_thm}")
print(f"Trung vị của cột 'Trihalomethanes': {median_thm}")
print(f"Độ lệch chuẩn của cột 'Trihalomethanes': {std_thm}")
Based on the above data, we can impute Trihalomethanes with either mean or median
# Vẽ biểu đồ phân phối của cột 'Trihalomethanes'
[Link](figsize=(10, 6))
[Link](df['Trihalomethanes'], kde=True, bins=20, color='skyblue', edgecolor='black')
[Link](mean_thm, color='r', linestyle='dashed', linewidth=1, label=f'Mean: {mean_thm:.2f}')
[Link](median_thm, color='g', linestyle='dashed', linewidth=1, label=f'Median:
{median_thm:.2f}')
[Link]('Distribution of Trihalomethanes')
[Link]('Trihalomethanes')
[Link]('Frequency')
[Link]()
[Link]().
Missing Value imputation

Missing values in ph column

In [12]:
# impute missing values with mean
# 12# Kiểm tra giá trị thiếu trong cột 'ph'
missing_ph_values = df['ph'].isnull().sum()

# Hiển thị số lượng giá trị thiếu trong cột 'ph'

print(f"Số lượng giá trị thiếu trong cột 'ph': {missing_ph_values}")
Identify outliers in the data

In [13]:
# check outliers
# 13# Đọc dữ liệu vào filr, sau đó tính toán
# Tính toán các phân vị Q1 và Q3
Q1 = df['ph'].quantile(0.25)
Q3 = df['ph'].quantile(0.75)

# Tính toán IQR

IQR = Q3 - Q1

# Xác định các điểm ngoại lai

outliers = df[(df['ph'] < (Q1 - 1.5 * IQR)) | (df['ph'] > (Q3 + 1.5 * IQR))]

# Hiển thị số lượng và các điểm ngoại lai

print(f"Số lượng điểm ngoại lai trong cột 'ph': {len(outliers)}")
print("Các điểm ngoại lai trong cột 'ph':")
print(outliers)

Identify corrleation between variables

In [14]:
# # Tính toán ma trận hệ số tương quan
correlation_matrix = [Link]()

# Hiển thị ma trận hệ số tương quan

print("Ma trận hệ số tương quan:")
print(correlation_matrix)

# Vẽ ma trận hệ số tương quan dưới dạng heatmap

[Link](figsize=(12, 8))
[Link](correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
[Link]('Correlation Matrix')
[Link]()

There are no categorical variables in the dataset.

Identify skweness in the data
In [15]:
# identify skewness
# Enter your code here
# Showing the skewed columns
# Enter your code here
Number of skewed columns : 0

Out[15]:
#15# Tính toán độ méo lệch của các biến số liên tục
skewness = [Link]()

# Xác định các cột có độ méo lệch dương (lệch phải)

right_skewed_columns = skewness[skewness > 0].[Link]()

# Hiển thị số lượng và các cột bị lệch phải

print(f"Số lượng cột bị lệch phải: {len(right_skewed_columns)}")
print("Các cột bị lệch phải:")
print(right_skewed_columns)

Skew
There are no skew in our data :)
Lets see the distribution of Potability

In [31]:
#16(31)# Hiển thị phân phối của khả năng uống được
[Link](figsize=(10, 6))
[Link](x='Potability', data=df, palette='viridis')
[Link]('Phân phối của khả năng uống được')
[Link]('Khả năng uống được')
[Link]('Số lượng')
[Link]()

Out[31]:
0 1998
1 1278
Name: Potability, dtype: int64
Feature Transformation

In [16]:
# print the dataframe head
# #16
print([Link](5).to_string(index=False))

Out[16]:

Chlor Trihalo Tur Pota

Hard Sulfa Condu Organic
ph Solids amine methane bidit bilit
ness te ctivity _carbon
s s y y

7.08
204.8 20791. 7.3002 368.5 564.30 10.3797 86.99097 2.96
0 079 0
90455 318981 12 16441 8654 83 0 3135
5

3.71
129.4 18630. 6.6352 333.7 592.88 15.1800 56.32907 4.50
1 608 0
22921 057858 46 75777 5359 13 6 0656
0

8.09
224.2 19909. 9.2758 333.7 418.60 16.8686 66.42009 3.05
2 912 0
36259 541732 84 75777 6213 37 3 5934
4

8.31
214.3 22018. 8.0593 356.8 363.26 18.4365 100.3416 4.62
3 676 0
73394 417441 32 86136 6516 24 74 8771
6

9.09
181.1 17978. 6.5466 310.1 398.41 11.5582 31.99799 4.07
4 222 0
01509 986339 00 35738 0813 79 3 5075
3
In [17]:
# Feature transformation
# scale the numeric columns
# 17
import pandas as pd
from [Link] import StandardScaler

# Đường dẫn tới file CSV

df = pd.read_csv(r'G:/python excel/water_potability.csv')

# Chọn các cột số liên tục

continuous_columns = ['ph', 'Sulfate', 'Trihalomethanes']

# Khởi tạo bộ chuyển đổi chuẩn hóa

scaler = StandardScaler()

# Áp dụng chuẩn hóa

df[continuous_columns] = scaler.fit_transform(df[continuous_columns])

print([Link]())
In [18]:
# After transformation print the dataframe head
# #18
import pandas as pd
from [Link] import StandardScaler, MinMaxScaler

# Đọc dữ liệu từ file CSV

df = pd.read_csv('G:/python excel/water_potability.csv')

# Chọn các cột số liên tục

continuous_columns = ['ph', 'Sulfate', 'Trihalomethanes']

# Khởi tạo bộ chuyển đổi chuẩn hóa

scaler = StandardScaler()

# Áp dụng chuẩn hóa

df[continuous_columns] = scaler.fit_transform(df[continuous_columns])
# Hiển thị phần đầu của DataFrame sau khi chuẩn hóa
print("DataFrame sau khi chuẩn hóa:")
print([Link]())

Out[18]:

Hard Soli Chlora Sulf Condu Organic Trihalom Turb Pota

ph
ness ds mines ate ctivity _carbon ethanes idity bility

- -
0.00 0.198 0.0854 1.04 1.2271 -
0 0.01 1.028759 0.935 0
0000 981 92 3542 78 0.854560
1702 210

- - - -
0.00 1.4734 0.514
1 2.11 1.696 0.19 0.2490 0.214093 -0.502884 0
0000 06 449
3014 382 6962 88

- - -
0.63 0.684 1.0795 0.00
2 0.08 0.0282 0.590024 0.001189 0.847 0
9503 850 58 0000
7287 51 715

-
0.77 0.437 0.09 0.4674 0.69 0.635
3 0.5050 0.939076 1.695662 0
6180 145 3483 46 4190 242
79

- - - - -
1.26 - 0.113
4 0.398 0.25 0.2936 0.71 0.2022 -1.718287 0
3161 0.592197 188
477 2771 90 0100 62

Save the cleaned data

In [30]:
# #19(30)
# Chọn các cột số liên tục
continuous_columns = ['ph', 'Sulfate', 'Trihalomethanes']

Hypothesis Testing
We define a hypothesis to test in our data set

Hypothesis 1:

Null: Increase in pH is associated with increase in Solids

Alternate : No relataion between ph and Solids

In [29]:
# import pandas as pd
from scipy import stats
import seaborn as sns
import [Link] as plt

# Đọc dữ liệu từ file CSV

df = pd.read_csv('G:/python excel/water_potability.csv')

# Giả sử cột 'Solid' là tên cột tương ứng

# Tính toán hệ số tương quan giữa pH và Solid
stat, p_value = stats.ttest_ind(df['ph'], df['Solids'])

print("T-test result: statistic = {}, p-value = {}".format(stat, p_value))

if p_value < 0.05:

print("Giá trị p nhỏ hơn 0,05, vì vậy bác bỏ giả thuyết không ở mức ý nghĩa 5%.")
else:
print("Giá trị p không nhỏ hơn 0,05, vì vậy không thể bác bỏ giả thuyết khống ở mức ý nghĩa
5%.")

Out[29]:
Ttest_indResult(statistic=-4.476932191647608, pvalue=7.705940306619221e-06)
the p value is less than 0.05 , so we are rejecting the null hypothesis at 5% significance level.
Next Step in analyzing the data

Write here

Quality of data

Write here
Key findings
Write here

PRJ
No ratings yet
PRJ
17 pages
CE10001-C3-Water Quality and Pollution-I
No ratings yet
CE10001-C3-Water Quality and Pollution-I
54 pages
Water Quality Analysis and Prediction
No ratings yet
Water Quality Analysis and Prediction
26 pages
Water Quality and Pollution Guide
No ratings yet
Water Quality and Pollution Guide
30 pages
DATA INTRODUCTION - Phong
No ratings yet
DATA INTRODUCTION - Phong
5 pages
Data Introduction
No ratings yet
Data Introduction
10 pages
Water Quality & Engineering Basics
No ratings yet
Water Quality & Engineering Basics
14 pages
Water Quality Parameters and Standards
No ratings yet
Water Quality Parameters and Standards
32 pages
Water Quality Data Analysis
No ratings yet
Water Quality Data Analysis
4 pages
Water Pollution2023
No ratings yet
Water Pollution2023
18 pages
Understanding Water Source and Quality
No ratings yet
Understanding Water Source and Quality
26 pages
By Maryum Atique M. Phill Chemistry University of Agriculture FSD
No ratings yet
By Maryum Atique M. Phill Chemistry University of Agriculture FSD
19 pages
Acaee 90
No ratings yet
Acaee 90
4 pages
Module 1
No ratings yet
Module 1
55 pages
Water Quality Issues & Solutions Guide
No ratings yet
Water Quality Issues & Solutions Guide
93 pages
Module-3 Water Quality Overview
No ratings yet
Module-3 Water Quality Overview
11 pages
14-May - Jupyter Notebook
No ratings yet
14-May - Jupyter Notebook
15 pages
Key Water Quality Parameters Explained
No ratings yet
Key Water Quality Parameters Explained
3 pages
ES 203: Water and Wastewater Engineering: Prof. Anurag Garg Environmental Science & Engineering Department, IIT Bombay
No ratings yet
ES 203: Water and Wastewater Engineering: Prof. Anurag Garg Environmental Science & Engineering Department, IIT Bombay
13 pages
Applied Chemistry Manual
No ratings yet
Applied Chemistry Manual
33 pages
Risk Diagnosis Report of The Drinking Water Treatment and Distribution System
No ratings yet
Risk Diagnosis Report of The Drinking Water Treatment and Distribution System
7 pages
P-2, Chapter-2
No ratings yet
P-2, Chapter-2
26 pages
Chapter 1 Sem 20192020 Updated 3.9.2019
No ratings yet
Chapter 1 Sem 20192020 Updated 3.9.2019
134 pages
New DOC Document
No ratings yet
New DOC Document
11 pages
Chapter 1 Sem 20192020 Updated 24.9.2019
No ratings yet
Chapter 1 Sem 20192020 Updated 24.9.2019
134 pages
Water Quality Standards Guide
No ratings yet
Water Quality Standards Guide
48 pages
Presentation 8
No ratings yet
Presentation 8
25 pages
Research - Potable Drinking Water
No ratings yet
Research - Potable Drinking Water
11 pages
Oxygen, Total Dissolved Solids and Total Hardness, Electrical Conductance Have Been
No ratings yet
Oxygen, Total Dissolved Solids and Total Hardness, Electrical Conductance Have Been
25 pages
Oel 6 & 7 Environment - G4
No ratings yet
Oel 6 & 7 Environment - G4
61 pages
Environmental Engineering 1
No ratings yet
Environmental Engineering 1
2 pages
GS EC Module1 Fallsem2019
No ratings yet
GS EC Module1 Fallsem2019
55 pages
Chapter 2 Water Quality (New)
No ratings yet
Chapter 2 Water Quality (New)
170 pages
Name: Muhamad Aiman Bin Mohd Azman Matrix ID: 2019717423 Class: AS229 3A Date of Submission: 5 April 2020
No ratings yet
Name: Muhamad Aiman Bin Mohd Azman Matrix ID: 2019717423 Class: AS229 3A Date of Submission: 5 April 2020
12 pages
Water Quality Monitoring Approach - BMB
No ratings yet
Water Quality Monitoring Approach - BMB
6 pages
Environmentalengineering
No ratings yet
Environmentalengineering
27 pages
Chapter 1 Sem 2021
No ratings yet
Chapter 1 Sem 2021
136 pages
Matching S
No ratings yet
Matching S
7 pages
Water Quality Standards Guide
No ratings yet
Water Quality Standards Guide
2 pages
Analysis of Water
No ratings yet
Analysis of Water
16 pages
Chapter 2 WATER-QUALITY PDF
No ratings yet
Chapter 2 WATER-QUALITY PDF
136 pages
Wa0000.
No ratings yet
Wa0000.
7 pages
EE Lab Manual CIVIL 2018-19 - Modified
No ratings yet
EE Lab Manual CIVIL 2018-19 - Modified
29 pages
Water Quality Index For Measuring Drinking Water Quality in Rural Bangladesh: A Cross-Sectional Study
No ratings yet
Water Quality Index For Measuring Drinking Water Quality in Rural Bangladesh: A Cross-Sectional Study
12 pages
Key Water Quality Parameters Overview
50% (2)
Key Water Quality Parameters Overview
48 pages
Water Quality Data Analysis
No ratings yet
Water Quality Data Analysis
30 pages
The Physical-Chemical Composition of Natural Waters
No ratings yet
The Physical-Chemical Composition of Natural Waters
22 pages
Water Quality Monitoring in Rivers
100% (1)
Water Quality Monitoring in Rivers
35 pages
Water Quality Characteristics Explained
No ratings yet
Water Quality Characteristics Explained
73 pages
Water Quality Analysis - Physics
No ratings yet
Water Quality Analysis - Physics
10 pages
Module - 2 Water Quality and Parameters
No ratings yet
Module - 2 Water Quality and Parameters
35 pages
Groundwater Quality Tankadhara Behera
No ratings yet
Groundwater Quality Tankadhara Behera
48 pages
Water Quality: Water Quality Parameters: 1) Physical Characteristics
No ratings yet
Water Quality: Water Quality Parameters: 1) Physical Characteristics
12 pages
Water Analysis Report - V2
No ratings yet
Water Analysis Report - V2
18 pages
Chapter 1 Sem 2020
No ratings yet
Chapter 1 Sem 2020
136 pages
Analysis of Water
No ratings yet
Analysis of Water
13 pages
Drinking Water Quality Standards and Objectives
No ratings yet
Drinking Water Quality Standards and Objectives
6 pages
Dong Nai River Water Quality Study
No ratings yet
Dong Nai River Water Quality Study
17 pages
Organic Chemistry for Health Students
No ratings yet
Organic Chemistry for Health Students
321 pages
Occupation Diseases
No ratings yet
Occupation Diseases
26 pages
Fragrance Allergen Report for Graphite
No ratings yet
Fragrance Allergen Report for Graphite
1 page
Alkene Isomerisation - [email protected]
No ratings yet
Alkene Isomerisation - [email protected]
12 pages
Cyclohexane Chlorination Explained
No ratings yet
Cyclohexane Chlorination Explained
1 page
Toaz - Info Chapter 4 PR
No ratings yet
Toaz - Info Chapter 4 PR
6 pages
WAEC Chemistry Past Questions
No ratings yet
WAEC Chemistry Past Questions
17 pages
Post Harvest Management of Fruits & Vegetables
No ratings yet
Post Harvest Management of Fruits & Vegetables
124 pages
Ultimate Organic 51 Imp Named Reactions
No ratings yet
Ultimate Organic 51 Imp Named Reactions
5 pages
Accepted Manuscript: View Article Online
No ratings yet
Accepted Manuscript: View Article Online
21 pages
Water pH and Turbidity Testing Guide
No ratings yet
Water pH and Turbidity Testing Guide
68 pages
Das Et Al 2024 Strategic Synthesis of Sulfinamides As Versatile S (IV) Intermediates
No ratings yet
Das Et Al 2024 Strategic Synthesis of Sulfinamides As Versatile S (IV) Intermediates
12 pages
STPM Trials 2009 Chemistry Paper 2 Malacca PDF
No ratings yet
STPM Trials 2009 Chemistry Paper 2 Malacca PDF
19 pages
Properties of Matter in Chemistry 1
100% (1)
Properties of Matter in Chemistry 1
24 pages
ANSC321 Lab Exam Study Guide
No ratings yet
ANSC321 Lab Exam Study Guide
7 pages
Project On Basic Dye and Dyeing Contents
No ratings yet
Project On Basic Dye and Dyeing Contents
37 pages
Prokaryotic Transcription Steps
No ratings yet
Prokaryotic Transcription Steps
7 pages
Unit III Biomass
No ratings yet
Unit III Biomass
12 pages
Organic Gate Pyq
No ratings yet
Organic Gate Pyq
133 pages
Histochemical Staining of Carbohydrates: Fixation
No ratings yet
Histochemical Staining of Carbohydrates: Fixation
4 pages
J. Soc.: Oxygen Adducts of Schiff's Base Complexes Cobalt Prepared in Solution
No ratings yet
J. Soc.: Oxygen Adducts of Schiff's Base Complexes Cobalt Prepared in Solution
8 pages
Steric and Stereoelectronic Effects in Organic Chemistry 2nd 2nd Edition Veejendra K Yadav Download
100% (2)
Steric and Stereoelectronic Effects in Organic Chemistry 2nd 2nd Edition Veejendra K Yadav Download
89 pages
Exosap-It PCR Cleanup Reagents: One Step To Superior Sequencing Results
No ratings yet
Exosap-It PCR Cleanup Reagents: One Step To Superior Sequencing Results
5 pages
Farmers' Guide to Bioaab Fertilizer
No ratings yet
Farmers' Guide to Bioaab Fertilizer
10 pages
Water Research: Xiaolu Zhang, Daekyun Kim, David L. Freedman, Tanju Karanfil
No ratings yet
Water Research: Xiaolu Zhang, Daekyun Kim, David L. Freedman, Tanju Karanfil
10 pages
Serum Urea Estimation
No ratings yet
Serum Urea Estimation
15 pages
Overview of Vis Breaking in Polypropylene
100% (1)
Overview of Vis Breaking in Polypropylene
25 pages
10th Physics
No ratings yet
10th Physics
34 pages
Project Work - Biology (Class Ix)
100% (1)
Project Work - Biology (Class Ix)
9 pages
Stock 2021
No ratings yet
Stock 2021
1,304 pages

Lab 1 - Python - Excel

Uploaded by

Lab 1 - Python - Excel

Uploaded by

TRẦN THỊ LAN ANH

1) Describe the data

2) Visualize the data

3) Identify the missing values and fill them

4) Identify the outliers and remove them

5) Identify the categorical variables and encode them (if any)

6) Identify the numerical variables and perform basic statistical analysis

Chlor Trihalo Tur Pota

204.8 20791. 7.3002 368.5 564.30 10.3797 86.99097 2.96

More information about the data

ph - PH is an important parameter in evaluating the acid–base balance of water. It is also the

Chlor Cond Organi Trihalo

co 2785. 3276. 3276. 3276. 2495. 3276. 3276. 3276.

m 196.3 22014 333.7

176.8 15666 307.6

196.9 20927 333.0

216.6 27332 359.9

323.1 61227 481.0

# Hiển thị kết quả

# Vẽ biểu đồ cột cho cột 'ph'

# Add mean and median lines

[Link]('Histogram of Sulfate values')

Analyze Trihalomethanes column

# Kiểm tra giá trị thiếu trong cột 'Trihalomethanes'

# Tính toán trung bình, trung vị, độ lệch chuẩn

# Hiển thị kết quả

Missing values in ph column

# Hiển thị số lượng giá trị thiếu trong cột 'ph'

# Tính toán IQR

# Xác định các điểm ngoại lai

# Hiển thị số lượng và các điểm ngoại lai

Identify corrleation between variables

# Hiển thị ma trận hệ số tương quan

# Vẽ ma trận hệ số tương quan dưới dạng heatmap

There are no categorical variables in the dataset.

# Xác định các cột có độ méo lệch dương (lệch phải)

# Hiển thị số lượng và các cột bị lệch phải

Chlor Trihalo Tur Pota

# Đường dẫn tới file CSV

# Chọn các cột số liên tục

# Khởi tạo bộ chuyển đổi chuẩn hóa

# Áp dụng chuẩn hóa

# Đọc dữ liệu từ file CSV

# Chọn các cột số liên tục

# Khởi tạo bộ chuyển đổi chuẩn hóa

# Áp dụng chuẩn hóa

Hard Soli Chlora Sulf Condu Organic Trihalom Turb Pota

Save the cleaned data

Null: Increase in pH is associated with increase in Solids

Alternate : No relataion between ph and Solids

# Đọc dữ liệu từ file CSV

# Giả sử cột 'Solid' là tên cột tương ứng

print("T-test result: statistic = {}, p-value = {}".format(stat, p_value))

if p_value < 0.05:

You might also like