0% found this document useful (0 votes)
19 views21 pages

Visualisation of The Data - Jupyter Notebook

Uploaded by

Naineni Shiny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views21 pages

Visualisation of The Data - Jupyter Notebook

Uploaded by

Naineni Shiny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [1]: import pandas as pd


import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from [Link] import r2_score
from sklearn.model_selection import train_test_split
from [Link] import mean_squared_error

In [2]: df = pd.read_excel('2001_final.xlsx')

In [3]: df

Out[3]: ad_observation_id depth temperature salinity density ao_wmo_number latitu

0 2900168_29/10/2001 7.1 28.690 36.367 1023.197021 2900168 10.0

1 2900168_29/10/2001 9.4 28.696 36.367 1023.195007 2900168 10.0

2 2900168_29/10/2001 19.2 28.697 36.367 1023.195007 2900168 10.0

3 2900168_29/10/2001 28.9 28.702 36.367 1023.192993 2900168 10.0

4 2900168_29/10/2001 39.8 28.690 36.365 1023.195007 2900168 10.0

... ... ... ... ... ... ...

2224 2900164_31/12/2001 1699.3 4.101 34.866 1027.668945 2900164 5.9

2225 2900164_31/12/2001 1799.4 3.621 34.840 1027.697998 2900164 5.9

2226 2900164_31/12/2001 1898.9 3.269 34.818 1027.714966 2900164 5.9

2227 2900164_31/12/2001 1999.6 2.875 34.798 1027.734985 2900164 5.9

2228 2900164_31/12/2001 2000.5 2.876 34.798 1027.734985 2900164 5.9

2229 rows × 9 columns

localhost:8888/notebooks/Visualisation of the [Link] 1/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [4]: [Link]()

<class '[Link]'>
RangeIndex: 2229 entries, 0 to 2228
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ad_observation_id 2229 non-null object
1 depth 2229 non-null float64
2 temperature 2229 non-null float64
3 salinity 2229 non-null float64
4 density 2229 non-null float64
5 ao_wmo_number 2229 non-null int64
6 latitude 2229 non-null float64
7 longitude 2229 non-null float64
8 date 2229 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(6), int64(1), object(1)
memory usage: 156.9+ KB

In [5]: df1=df

localhost:8888/notebooks/Visualisation of the [Link] 2/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [6]: df1

Out[6]: ad_observation_id depth temperature salinity density ao_wmo_number latitu

0 2900168_29/10/2001 7.1 28.690 36.367 1023.197021 2900168 10.0

1 2900168_29/10/2001 9.4 28.696 36.367 1023.195007 2900168 10.0

2 2900168_29/10/2001 19.2 28.697 36.367 1023.195007 2900168 10.0

3 2900168_29/10/2001 28.9 28.702 36.367 1023.192993 2900168 10.0

4 2900168_29/10/2001 39.8 28.690 36.365 1023.195007 2900168 10.0

... ... ... ... ... ... ...

2224 2900164_31/12/2001 1699.3 4.101 34.866 1027.668945 2900164 5.9

2225 2900164_31/12/2001 1799.4 3.621 34.840 1027.697998 2900164 5.9

2226 2900164_31/12/2001 1898.9 3.269 34.818 1027.714966 2900164 5.9

2227 2900164_31/12/2001 1999.6 2.875 34.798 1027.734985 2900164 5.9

2228 2900164_31/12/2001 2000.5 2.876 34.798 1027.734985 2900164 5.9

2229 rows × 9 columns

localhost:8888/notebooks/Visualisation of the [Link] 3/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [7]: df1=df1.drop_duplicates()
df1

Out[7]: ad_observation_id depth temperature salinity density ao_wmo_number latitu

0 2900168_29/10/2001 7.1 28.690 36.367 1023.197021 2900168 10.0

1 2900168_29/10/2001 9.4 28.696 36.367 1023.195007 2900168 10.0

2 2900168_29/10/2001 19.2 28.697 36.367 1023.195007 2900168 10.0

3 2900168_29/10/2001 28.9 28.702 36.367 1023.192993 2900168 10.0

4 2900168_29/10/2001 39.8 28.690 36.365 1023.195007 2900168 10.0

... ... ... ... ... ... ...

2224 2900164_31/12/2001 1699.3 4.101 34.866 1027.668945 2900164 5.9

2225 2900164_31/12/2001 1799.4 3.621 34.840 1027.697998 2900164 5.9

2226 2900164_31/12/2001 1898.9 3.269 34.818 1027.714966 2900164 5.9

2227 2900164_31/12/2001 1999.6 2.875 34.798 1027.734985 2900164 5.9

2228 2900164_31/12/2001 2000.5 2.876 34.798 1027.734985 2900164 5.9

2229 rows × 9 columns

In [8]: df=df[['date','temperature','salinity']]

localhost:8888/notebooks/Visualisation of the [Link] 4/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [9]: df

Out[9]: date temperature salinity

0 2001-10-29 [Link] 28.690 36.367

1 2001-10-29 [Link] 28.696 36.367

2 2001-10-29 [Link] 28.697 36.367

3 2001-10-29 [Link] 28.702 36.367

4 2001-10-29 [Link] 28.690 36.365

... ... ... ...

2224 2001-12-31 [Link] 4.101 34.866

2225 2001-12-31 [Link] 3.621 34.840

2226 2001-12-31 [Link] 3.269 34.818

2227 2001-12-31 [Link] 2.875 34.798

2228 2001-12-31 [Link] 2.876 34.798

2229 rows × 3 columns

In [10]: [Link] = [Link]('date')


df

Out[10]: temperature salinity

date

2001-10-29 [Link] 28.690 36.367

2001-10-29 [Link] 28.696 36.367

2001-10-29 [Link] 28.697 36.367

2001-10-29 [Link] 28.702 36.367

2001-10-29 [Link] 28.690 36.365

... ... ...

2001-12-31 [Link] 4.101 34.866

2001-12-31 [Link] 3.621 34.840

2001-12-31 [Link] 3.269 34.818

2001-12-31 [Link] 2.875 34.798

2001-12-31 [Link] 2.876 34.798

2229 rows × 2 columns

In [11]: import seaborn as sns

localhost:8888/notebooks/Visualisation of the [Link] 5/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [12]: [Link](df1)

Out[12]: <[Link] at 0x2c6cf612280>

In [ ]: ​

In [13]: [Link]()

<class '[Link]'>
Int64Index: 2229 entries, 0 to 2228
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ad_observation_id 2229 non-null object
1 depth 2229 non-null float64
2 temperature 2229 non-null float64
3 salinity 2229 non-null float64
4 density 2229 non-null float64
5 ao_wmo_number 2229 non-null int64
6 latitude 2229 non-null float64
7 longitude 2229 non-null float64
8 date 2229 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(6), int64(1), object(1)
memory usage: 238.7+ KB

localhost:8888/notebooks/Visualisation of the [Link] 6/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [14]: df1 = [Link](columns=['ad_observation_id'])

Variation of Temperature and Salinity Based


on the DEPTH
In [15]: df1 = df1.sort_values(by='depth')

# Line graph
fig, ax1 = [Link](figsize=(10, 6))

# Plotting temperature on the scatter plot
[Link](df1['temperature'], df1['depth'], marker='o', color='black', label
ax1.set_xlabel('Temperature')
ax1.set_ylabel('Depth', color='blue')
ax1.tick_params('y', colors='blue')

# Creating a secondary y-axis for Salinity
ax2 = [Link]()
[Link](df1['salinity'], df1['depth'], marker='x', color='red', label='Sal
ax2.set_xlabel('Salinity', color='red')
ax2.tick_params('x', colors='red')

[Link]('Temperature, Depth, and Salinity Relationship (Line Graph)')
[Link]()
[Link]()


localhost:8888/notebooks/Visualisation of the [Link] 7/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [16]: df1

Out[16]: depth temperature salinity density ao_wmo_number latitude longitude da

200
1214 5.2 27.520 36.029 1023.327026 2900080 5.023 63.625 12-
[Link]

200
1740 5.3 27.904 36.040 1023.210999 2900080 4.910 63.375 12-
[Link]

200
1844 5.4 28.274 36.081 1023.119995 2900080 4.754 63.104 12-
[Link]

200
1110 5.5 27.749 36.145 1023.340027 2900080 5.139 63.844 12-
[Link]

200
333 6.5 28.999 36.061 1022.864014 2900164 5.286 59.843 11-
[Link]

... ... ... ... ... ... ... ...

200
1038 2007.5 2.735 34.792 1027.743042 2900164 5.685 59.218 12-
[Link]

200
1598 2007.6 2.997 34.800 1027.725952 2900167 6.857 62.363 12-
[Link]

200
686 2008.6 2.829 34.796 1027.738037 2900164 5.556 59.597 11-
[Link]

200
545 2009.9 2.903 34.802 1027.735962 2900168 9.313 60.849 11-
[Link]

200
967 2010.8 2.967 34.799 1027.728027 2900167 7.089 62.598 12-
[Link]

2229 rows × 8 columns

localhost:8888/notebooks/Visualisation of the [Link] 8/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [17]: ​
# Assuming df1 is your DataFrame containing "salinity" and "temperature" co
[Link](figsize=(13, 9))

# Scatter plot with colors based on salinity and temperature
scatter = [Link](df1["salinity"], df1["temperature"], s=65, c=df1["sal

[Link]('Salinity', fontsize=25)
[Link]('Temperature', fontsize=25)
[Link]('Salinity vs Temperature', fontsize=25)

# Adding colorbar to show the mapping of colors to salinity values
#cbar = [Link](scatter)
#cbar.set_label('Salinity', fontsize=20)

[Link]()

localhost:8888/notebooks/Visualisation of the [Link] 9/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [18]: df1

Out[18]: depth temperature salinity density ao_wmo_number latitude longitude da

200
1214 5.2 27.520 36.029 1023.327026 2900080 5.023 63.625 12-
[Link]

200
1740 5.3 27.904 36.040 1023.210999 2900080 4.910 63.375 12-
[Link]

200
1844 5.4 28.274 36.081 1023.119995 2900080 4.754 63.104 12-
[Link]

200
1110 5.5 27.749 36.145 1023.340027 2900080 5.139 63.844 12-
[Link]

200
333 6.5 28.999 36.061 1022.864014 2900164 5.286 59.843 11-
[Link]

... ... ... ... ... ... ... ...

200
1038 2007.5 2.735 34.792 1027.743042 2900164 5.685 59.218 12-
[Link]

200
1598 2007.6 2.997 34.800 1027.725952 2900167 6.857 62.363 12-
[Link]

200
686 2008.6 2.829 34.796 1027.738037 2900164 5.556 59.597 11-
[Link]

200
545 2009.9 2.903 34.802 1027.735962 2900168 9.313 60.849 11-
[Link]

200
967 2010.8 2.967 34.799 1027.728027 2900167 7.089 62.598 12-
[Link]

2229 rows × 8 columns

localhost:8888/notebooks/Visualisation of the [Link] 10/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [19]: import [Link] as plt


import seaborn as sns
from sklearn.linear_model import LinearRegression

# Assuming df1 is your DataFrame
X = df1[['temperature', 'depth']]
Y = df1['salinity']

# Create an instance of LinearRegression
lin_reg5 = LinearRegression()

# Fit the model
lin_reg5.fit(X, Y)

# Make predictions using the fitted model
predictions = lin_reg5.predict(X)

# Plotting the scatter plot
[Link](font_scale=1)
[Link](figsize=(15, 15))

# Scatter plot
[Link](Y, predictions, s=65, label='Actual vs. Predicted Salinity')

# Diagonal line for perfect fit
[Link]([[Link](), [Link]()], [[Link](), [Link]()], '--', color='red', label='

[Link]('Actual Salinity', fontsize=25)
[Link]('Predicted Salinity', fontsize=25)
[Link]('Actual vs. Predicted Salinity in Multi-linear Regression', fonts
[Link]()
[Link]()

localhost:8888/notebooks/Visualisation of the [Link] 11/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [20]: salt = [Link][:, 2:3].values


salt

Out[20]: array([[36.029],
[36.04 ],
[36.081],
...,
[34.796],
[34.802],
[34.799]])

In [21]: temp = [Link][:, 1:2].values


temp

Out[21]: array([[27.52 ],
[27.904],
[28.274],
...,
[ 2.829],
[ 2.903],
[ 2.967]])

localhost:8888/notebooks/Visualisation of the [Link] 12/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [ ]: ​

In [22]: from sklearn.linear_model import LinearRegression

In [23]: lin_reg=LinearRegression()

In [24]: lin_reg=LinearRegression()
lin_reg.fit(temp,salt)

Out[24]: LinearRegression()

In [25]: [Link](font_scale=1)
[Link](figsize=(15, 15))
[Link](temp,salt,s=65)
[Link](temp,lin_reg.predict(temp), color='red', linewidth='2')
[Link]('Temperature',fontsize=25)
[Link]('Salinity',fontsize=25)
[Link]('salinity prediction using temperature',fontsize=25)
[Link]()

localhost:8888/notebooks/Visualisation of the [Link] 13/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [26]: import [Link] as plt


import seaborn as sns
from sklearn.linear_model import LinearRegression

# Assuming df1 is your DataFrame
X = df1[['temperature', 'depth']]
Y = df1['salinity']

# Create an instance of LinearRegression
lin_reg5 = LinearRegression()

# Fit the model
lin_reg5.fit(X, Y)

# Make predictions using the fitted model
predictions = lin_reg5.predict(X)

# Plotting the scatter plot
[Link](font_scale=1)
[Link](figsize=(15, 15))

# Scatter plot
[Link](Y, predictions, s=65, label='Actual vs. Predicted Salinity')

# Diagonal line for perfect fit
[Link]([[Link](), [Link]()], [[Link](), [Link]()], '--', color='red', label='

[Link]('Actual Salinity', fontsize=25)
[Link]('Predicted Salinity', fontsize=25)
[Link]('Actual vs. Predicted Salinity in Multi-linear Regression', fonts
[Link]()
[Link]()

localhost:8888/notebooks/Visualisation of the [Link] 14/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [27]: import operator

In [28]: [Link](temp,salt, s=65)


sort_axis = [Link](0)
sorted_zip = sorted(zip(temp,salt), key=sort_axis)
X_test, y_pred = zip(*sorted_zip)
[Link](temp, salt, color='g')
[Link]()

localhost:8888/notebooks/Visualisation of the [Link] 15/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [29]: from [Link] import PolynomialFeatures

In [30]: from [Link] import PolynomialFeatures

In [31]: pol = PolynomialFeatures(degree = 3)


Slt_pol = pol.fit_transform(salt)
[Link](Slt_pol, temp)
lin_reg2 = LinearRegression()
lin_reg2.fit(Slt_pol, temp)

Out[31]: LinearRegression()

In [32]: Predict_Tmp_pol = lin_reg2.predict(pol.fit_transform([[33]]))


Predict_Tmp_pol

Out[32]: array([[65.63395657]])

In [33]: pol = PolynomialFeatures(degree = 3)


Slt_pol = pol.fit_transform(salt)
[Link](Slt_pol, temp)

lin_reg2 = LinearRegression()
lin_reg2.fit(Slt_pol, temp)

Out[33]: LinearRegression()

In [34]: Predict_Tmp_pol = lin_reg2.predict(pol.fit_transform([[33]]))


Predict_Tmp_pol

Out[34]: array([[65.63395657]])

In [35]: from [Link] import r2_score



# Assuming you have Polynomial Regression results stored in Tmp_head_pol
Tmp_head_pol = lin_reg2.predict(Slt_pol)

# Initialize degerlendirme as an empty dictionary
degerlendirme = {}

# Calculate R-squared score for Polynomial Regression
polynomial_r2_score = r2_score(temp, Tmp_head_pol)

# Update degerlendirme with the new R-squared score
degerlendirme["Polynomial Regression R_Square Score"] = polynomial_r2_score

# Print or use degerlendirme
print("Polynomial Regression R_Square Score:", degerlendirme["Polynomial Re

Polynomial Regression R_Square Score: 0.7337768474290758

localhost:8888/notebooks/Visualisation of the [Link] 16/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [36]: import numpy as np

In [37]: [Link](font_scale=2.0)
[Link](figsize=(13, 9))
x_grid = [Link](min(salt), max(salt), 0.1)
x_grid = x_grid.reshape(-1,1)
[Link](salt,temp,s=65)
[Link](x_grid,lin_reg2.predict(pol.fit_transform(x_grid)) , color='red',
[Link]('Slt',fontsize=25)
[Link]('Temp',fontsize=25)
[Link]('salt degerlerine gore temp tahmin gosterimi',fontsize=25)
[Link]()

In [38]: x=[Link](['salinity'],axis=1)
y=df[['salinity']]

In [ ]: ​

In [39]: x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

In [40]: from [Link] import DecisionTreeRegressor



dt_reg = DecisionTreeRegressor() # create DecisionTreeReg with sk
dt_reg.fit(x_train,y_train)

Out[40]: DecisionTreeRegressor()

localhost:8888/notebooks/Visualisation of the [Link] 17/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [41]: dt_predict = dt_reg.predict(x_train)

In [ ]: ​

In [42]: ​

# Create Decision Tree Regressor and fit the model
tree_reg = DecisionTreeRegressor()
tree_reg.fit(temp, salt)

# Set seaborn font scale
[Link](font_scale=2.0)

# Create a new figure
[Link](figsize=(13, 9))

# Create a grid for smoother plot
x_grid = [Link](min(temp), max(temp), 0.1).reshape(-1, 1)

# Scatter plot
[Link](temp, salt, s=65)

# Plot Decision Tree Regression line
[Link](x_grid, tree_reg.predict(x_grid), color='red', linewidth=5)

# Set labels and title
[Link]('Temperature', fontsize=25)
[Link]('Salinity', fontsize=25)
[Link]('Salinity Prediction based on Temperature (Decision Tree Regressi

# Show the plot
[Link]()

localhost:8888/notebooks/Visualisation of the [Link] 18/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [43]: rmse = [Link](mean_squared_error(y_train,dt_predict))


r2 = r2_score(y_train,dt_predict)
print("RMSE Score for Test set: " +"{:.2}".format(rmse))
print("R2 Score for Test set: " +"{:.2}".format(r2))

RMSE Score for Test set: 0.042


R2 Score for Test set: 0.99

In [44]: from [Link] import RandomForestRegressor



rf_reg = RandomForestRegressor(n_estimators=5, random_state=0)
rf_reg.fit(x_train,y_train)
rf_predict = rf_reg.predict(x_train)
#rf_predict.mean()

C:\Users\shiny\AppData\Local\Temp\ipykernel_22996\[Link]: DataCo
nversionWarning: A column-vector y was passed when a 1d array was expecte
d. Please change the shape of y to (n_samples,), for example using ravel
().
rf_reg.fit(x_train,y_train)

localhost:8888/notebooks/Visualisation of the [Link] 19/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [45]: ​

# Create Random Forest Regressor and fit the model
forest_reg = RandomForestRegressor(n_estimators=100, random_state=42)
forest_reg.fit(temp, salt)

# Set seaborn font scale
[Link](font_scale=2.0)

# Create a new figure
[Link](figsize=(13, 9))

# Create a grid for smoother plot
x_grid = [Link](min(temp), max(temp), 0.1).reshape(-1, 1)

# Scatter plot
[Link](temp, salt, s=65)

# Plot Random Forest Regression line
[Link](x_grid, forest_reg.predict(x_grid), color='red', linewidth=5)

# Set labels and title
[Link]('Temperature', fontsize=25)
[Link]('Salinity', fontsize=25)
[Link]('Salinity Prediction based on Temperature (Random Forest Regressi

# Show the plot
[Link]()

C:\Users\shiny\AppData\Local\Temp\ipykernel_22996\[Link]: DataCon
versionWarning: A column-vector y was passed when a 1d array was expecte
d. Please change the shape of y to (n_samples,), for example using ravel
().
forest_reg.fit(temp, salt)

localhost:8888/notebooks/Visualisation of the [Link] 20/21


12/20/23, 11:57 PM Visualisation of the Data - Jupyter Notebook

In [46]: rmse = [Link](mean_squared_error(y_train,rf_predict))


r2 = r2_score(y_train,rf_predict)
print("RMSE Score for Test set: " +"{:.2}".format(rmse))
print("R2 Score for Test set: " +"{:.2}".format(r2))

RMSE Score for Test set: 0.12


R2 Score for Test set: 0.92

In [ ]: ​

In [ ]: ​

In [ ]: ​

localhost:8888/notebooks/Visualisation of the [Link] 21/21

You might also like