shishir-data-analytic-assignment
November 14, 2023
link text Assignment-2 for Batch A ____________________________________________
Submited By- SHISHIR RANJAN _____________________________________________
Roll no. - 2312res600 _____________________________________________________
Email -
[email protected] /
[email protected] _________________________________________________________________
colab Link - https://colab.research.google.com/drive/1KsKS7z0bBBVhd6S9LUJKftj7n3MWEBS4?usp=sharing
[18]: from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call
drive.mount("/content/drive", force_remount=True).
[19]: # Importing the numpy library
import numpy as np
# 1st--> Create an array of five elements and print its value.
arr = np.array([1, 2, 3, 4, 5])
print("Array is ", arr)
print("____________________________________")
#submitted by shishir ranjan
Array is [1 2 3 4 5]
____________________________________
QUESTION (1) COMPLETED ________________________________________________
QUESTION (2) .
[20]: # 2nd--> Use the above array elements to print the 1st and 2nd elements.
print("1st Element:", arr[0])
print("2nd Element:", arr[1])
print("____________________________________")
#submitted by shishir ranjan
1st Element: 1
2nd Element: 2
____________________________________
1
QUESTION (2) COMPLETED ________________________________________________
QUESTION (3).
[21]: # 3rd--> Get the third and fourth elements from the array and add them.
third_element = arr[2]
fourth_element = arr[3]
sum_third_fourth = third_element + fourth_element
print("Sum of 3rd and 4th elements:", sum_third_fourth)
print("____________________________________")
#submitted by shishir ranjan
Sum of 3rd and 4th elements: 7
____________________________________
QUESTION (3) COMPLETED ________________________________________________
QUESTION (4).
[22]: # 4th--> Print the elements from 1 to 5 from the above array.
print("Elements from 1 to 5:", arr[0:5])
print("____________________________________")
#submitted by shishir ranjan
Elements from 1 to 5: [1 2 3 4 5]
____________________________________
QUESTION (5) COMPLETED ________________________________________________
QUESTION (5).
[23]: # 5th--> Get the data type of the array.
data_type = arr.dtype
print("Data Type of the array:", data_type)
print("____________________________________")
"""--------------------- First question finish ------------------"""
#submitted by shishir ranjan
Data Type of the array: int64
____________________________________
[23]: '--------------------- First question finish ------------------'
DATA ANALYTIC ASSIGNMENT 1ST QUESTION SUBMITTED BY SHISHIR RANJAN .
_____________________________________________________
1.
[24]: # Importing the pandas library
import pandas as pd
2
# Loading the dataset
df1 = pd.read_csv("/content/drive/MyDrive/AirQualityUCI.csv")
# 1--> Drop the null values from the dataset. Use (dropna()) method.
df1 = df1.dropna()
print("(1) Dataset after dropping null values:\n", df1)
print("____________________________________")
#submitted by shishir ranjan
(1) Dataset after dropping null values:
Date;Time;CO(GT);PT08.S1(CO);NMHC(GT);C6H6(GT);PT08.S2(NMHC);NOx(GT);PT08.S3(NOx
);NO2(GT);PT08.S4(NO2);PT08.S5(O3);T;RH;AH;;
10/03/2004;18.00.00;2 6;1360;150;11 9;1046;166;1056;113;1692;1268;13 6;48 9;0
7578;;
10/03/2004;20.00.00;2 2;1402;88;9 0;939;131;1140;114;1555;1074;11 9;54 0;0
7502;;
10/03/2004;21.00.00;2 2;1376;80;9 2;948;172;1092;122;1584;1203;11 0;60 0;0
7867;;
10/03/2004;22.00.00;1 6;1272;51;6 5;836;131;1205;116;1490;1110;11 2;59 6;0
7888;;
10/03/2004;23.00.00;1 2;1197;38;4 7;750;89;1337;96;1393;949;11 2;59 2;0
7848;;
…
…
04/04/2005;10.00.00;3 1;1314;-200;13 5;1101;472;539;190;1374;1729;21 9;29 3;0
7568;;
04/04/2005;11.00.00;2 4;1163;-200;11 4;1027;353;604;179;1264;1269;24 3;23 7;0
7119;;
04/04/2005;12.00.00;2 4;1142;-200;12 4;1063;293;603;175;1241;1092;26 9;18 3;0
6406;;
04/04/2005;13.00.00;2 1;1003;-200;9 5;961;235;702;156;1041;770;28 3;13 5;0
5139;;
04/04/2005;14.00.00;2 2;1071;-200;11 9;1047;265;654;168;1129;816;28 5;13 1;0
5028;;
[6915 rows x 1 columns]
____________________________________
2.
[25]: # 2--> Replace NULL values with the number 130.
df1 = df1.fillna(130)
print("\n(2) Dataset after replacing NULL values with 130:\n", df1)
print("____________________________________")
#submitted by shishir ranjan
3
(2) Dataset after replacing NULL values with 130:
Date;Time;CO(GT);PT08.S1(CO);NMHC(GT);C6H6(GT);PT08.S2(NMHC);NOx(GT);PT08.S3(NOx
);NO2(GT);PT08.S4(NO2);PT08.S5(O3);T;RH;AH;;
10/03/2004;18.00.00;2 6;1360;150;11 9;1046;166;1056;113;1692;1268;13 6;48 9;0
7578;;
10/03/2004;20.00.00;2 2;1402;88;9 0;939;131;1140;114;1555;1074;11 9;54 0;0
7502;;
10/03/2004;21.00.00;2 2;1376;80;9 2;948;172;1092;122;1584;1203;11 0;60 0;0
7867;;
10/03/2004;22.00.00;1 6;1272;51;6 5;836;131;1205;116;1490;1110;11 2;59 6;0
7888;;
10/03/2004;23.00.00;1 2;1197;38;4 7;750;89;1337;96;1393;949;11 2;59 2;0
7848;;
…
…
04/04/2005;10.00.00;3 1;1314;-200;13 5;1101;472;539;190;1374;1729;21 9;29 3;0
7568;;
04/04/2005;11.00.00;2 4;1163;-200;11 4;1027;353;604;179;1264;1269;24 3;23 7;0
7119;;
04/04/2005;12.00.00;2 4;1142;-200;12 4;1063;293;603;175;1241;1092;26 9;18 3;0
6406;;
04/04/2005;13.00.00;2 1;1003;-200;9 5;961;235;702;156;1041;770;28 3;13 5;0
5139;;
04/04/2005;14.00.00;2 2;1071;-200;11 9;1047;265;654;168;1129;816;28 5;13 1;0
5028;;
[6915 rows x 1 columns]
____________________________________
3.
[26]: # 3--> Filter the value of SO2 > 500. Use data frame (df. loc) methods.
# filtered_df1 = df1.loc[df1['SO2'] > 500]
print("\n(3) Filtered values where SO2 > 500:\n", "So2 column does not exist in␣
↪the given dataset")
print("____________________________________")
#submitted by shishir ranjan
(3) Filtered values where SO2 > 500:
So2 column does not exist in the given dataset
____________________________________
4.
[27]: # 4--> Use drop_duplicates() method to drop duplicate values from the dataset.
df1 = df1.drop_duplicates()
4
print("\n(4) Dataset after dropping duplicate values:\n", df1)
print("____________________________________")
#submitted by shishir ranjan
(4) Dataset after dropping duplicate values:
Date;Time;CO(GT);PT08.S1(CO);NMHC(GT);C6H6(GT);PT08.S2(NMHC);NOx(GT);PT08.S3(NOx
);NO2(GT);PT08.S4(NO2);PT08.S5(O3);T;RH;AH;;
10/03/2004;18.00.00;2 6;1360;150;11 9;1046;166;1056;113;1692;1268;13 6;48 9;0
7578;;
10/03/2004;20.00.00;2 2;1402;88;9 0;939;131;1140;114;1555;1074;11 9;54 0;0
7502;;
10/03/2004;21.00.00;2 2;1376;80;9 2;948;172;1092;122;1584;1203;11 0;60 0;0
7867;;
10/03/2004;22.00.00;1 6;1272;51;6 5;836;131;1205;116;1490;1110;11 2;59 6;0
7888;;
10/03/2004;23.00.00;1 2;1197;38;4 7;750;89;1337;96;1393;949;11 2;59 2;0
7848;;
…
…
04/04/2005;02.00.00;0 5;912;-200;1 5;544;69;959;55;1002;573;12 1;56 3;0
7927;;
04/04/2005;05.00.00;0 5;888;-200;1 3;528;77;1077;53;987;578;10 4;59 9;0
7550;;
04/04/2005;06.00.00;1 1;1031;-200;4 4;730;182;760;93;1129;905;9 5;63 1;0
7531;;
04/04/2005;11.00.00;2 4;1163;-200;11 4;1027;353;604;179;1264;1269;24 3;23 7;0
7119;;
04/04/2005;14.00.00;2 2;1071;-200;11 9;1047;265;654;168;1129;816;28 5;13 1;0
5028;;
[4941 rows x 1 columns]
____________________________________
5.
[28]: # 5--> Use the correlation method to show the relationship between columns (df.
↪corr).
correlation_matrix = df1.corr()
print("\n(5) Correlation matrix:\n", correlation_matrix)
print("____________________________________")
"""--------------------- Second question finish ------------------"""
#submitted by shishir ranjan
(5) Correlation matrix:
Empty DataFrame
5
Columns: []
Index: []
____________________________________
<ipython-input-28-4b6fa2a277c1>:2: FutureWarning: The default value of
numeric_only in DataFrame.corr is deprecated. In a future version, it will
default to False. Select only valid columns or specify the value of numeric_only
to silence this warning.
correlation_matrix = df1.corr()
[28]: '--------------------- Second question finish ------------------'
DATA ANALYTIC ASSIGNMENT 2ND QUESTION COMPLETED .
__________________________________________________________________
DATA ANALYTIC ASSIGMENT SUBMITTED BY SHSHIR RANJAN .
__________________________________________________________________