0% found this document useful (0 votes)
21 views6 pages

Data Analytic Assignment

Uploaded by

Ayush Shishirrr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

Data Analytic Assignment

Uploaded by

Ayush Shishirrr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

shishir-data-analytic-assignment

November 14, 2023

link text Assignment-2 for Batch A ____________________________________________


Submited By- SHISHIR RANJAN _____________________________________________
Roll no. - 2312res600 _____________________________________________________
Email - [email protected] / [email protected]
_________________________________________________________________
colab Link - https://colab.research.google.com/drive/1KsKS7z0bBBVhd6S9LUJKftj7n3MWEBS4?usp=sharing

[18]: from google.colab import drive


drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call


drive.mount("/content/drive", force_remount=True).

[19]: # Importing the numpy library


import numpy as np

# 1st--> Create an array of five elements and print its value.


arr = np.array([1, 2, 3, 4, 5])
print("Array is ", arr)
print("____________________________________")
#submitted by shishir ranjan

Array is [1 2 3 4 5]
____________________________________
QUESTION (1) COMPLETED ________________________________________________
QUESTION (2) .

[20]: # 2nd--> Use the above array elements to print the 1st and 2nd elements.
print("1st Element:", arr[0])
print("2nd Element:", arr[1])
print("____________________________________")
#submitted by shishir ranjan

1st Element: 1
2nd Element: 2
____________________________________

1
QUESTION (2) COMPLETED ________________________________________________
QUESTION (3).

[21]: # 3rd--> Get the third and fourth elements from the array and add them.
third_element = arr[2]
fourth_element = arr[3]
sum_third_fourth = third_element + fourth_element
print("Sum of 3rd and 4th elements:", sum_third_fourth)
print("____________________________________")
#submitted by shishir ranjan

Sum of 3rd and 4th elements: 7


____________________________________
QUESTION (3) COMPLETED ________________________________________________
QUESTION (4).

[22]: # 4th--> Print the elements from 1 to 5 from the above array.
print("Elements from 1 to 5:", arr[0:5])
print("____________________________________")
#submitted by shishir ranjan

Elements from 1 to 5: [1 2 3 4 5]
____________________________________
QUESTION (5) COMPLETED ________________________________________________
QUESTION (5).

[23]: # 5th--> Get the data type of the array.


data_type = arr.dtype
print("Data Type of the array:", data_type)
print("____________________________________")

"""--------------------- First question finish ------------------"""


#submitted by shishir ranjan

Data Type of the array: int64


____________________________________

[23]: '--------------------- First question finish ------------------'

DATA ANALYTIC ASSIGNMENT 1ST QUESTION SUBMITTED BY SHISHIR RANJAN .


_____________________________________________________

1.
[24]: # Importing the pandas library
import pandas as pd

2
# Loading the dataset
df1 = pd.read_csv("/content/drive/MyDrive/AirQualityUCI.csv")

# 1--> Drop the null values from the dataset. Use (dropna()) method.
df1 = df1.dropna()
print("(1) Dataset after dropping null values:\n", df1)
print("____________________________________")
#submitted by shishir ranjan

(1) Dataset after dropping null values:

Date;Time;CO(GT);PT08.S1(CO);NMHC(GT);C6H6(GT);PT08.S2(NMHC);NOx(GT);PT08.S3(NOx
);NO2(GT);PT08.S4(NO2);PT08.S5(O3);T;RH;AH;;
10/03/2004;18.00.00;2 6;1360;150;11 9;1046;166;1056;113;1692;1268;13 6;48 9;0
7578;;
10/03/2004;20.00.00;2 2;1402;88;9 0;939;131;1140;114;1555;1074;11 9;54 0;0
7502;;
10/03/2004;21.00.00;2 2;1376;80;9 2;948;172;1092;122;1584;1203;11 0;60 0;0
7867;;
10/03/2004;22.00.00;1 6;1272;51;6 5;836;131;1205;116;1490;1110;11 2;59 6;0
7888;;
10/03/2004;23.00.00;1 2;1197;38;4 7;750;89;1337;96;1393;949;11 2;59 2;0
7848;;


04/04/2005;10.00.00;3 1;1314;-200;13 5;1101;472;539;190;1374;1729;21 9;29 3;0
7568;;
04/04/2005;11.00.00;2 4;1163;-200;11 4;1027;353;604;179;1264;1269;24 3;23 7;0
7119;;
04/04/2005;12.00.00;2 4;1142;-200;12 4;1063;293;603;175;1241;1092;26 9;18 3;0
6406;;
04/04/2005;13.00.00;2 1;1003;-200;9 5;961;235;702;156;1041;770;28 3;13 5;0
5139;;
04/04/2005;14.00.00;2 2;1071;-200;11 9;1047;265;654;168;1129;816;28 5;13 1;0
5028;;

[6915 rows x 1 columns]


____________________________________
2.
[25]: # 2--> Replace NULL values with the number 130.
df1 = df1.fillna(130)
print("\n(2) Dataset after replacing NULL values with 130:\n", df1)
print("____________________________________")
#submitted by shishir ranjan

3
(2) Dataset after replacing NULL values with 130:

Date;Time;CO(GT);PT08.S1(CO);NMHC(GT);C6H6(GT);PT08.S2(NMHC);NOx(GT);PT08.S3(NOx
);NO2(GT);PT08.S4(NO2);PT08.S5(O3);T;RH;AH;;
10/03/2004;18.00.00;2 6;1360;150;11 9;1046;166;1056;113;1692;1268;13 6;48 9;0
7578;;
10/03/2004;20.00.00;2 2;1402;88;9 0;939;131;1140;114;1555;1074;11 9;54 0;0
7502;;
10/03/2004;21.00.00;2 2;1376;80;9 2;948;172;1092;122;1584;1203;11 0;60 0;0
7867;;
10/03/2004;22.00.00;1 6;1272;51;6 5;836;131;1205;116;1490;1110;11 2;59 6;0
7888;;
10/03/2004;23.00.00;1 2;1197;38;4 7;750;89;1337;96;1393;949;11 2;59 2;0
7848;;


04/04/2005;10.00.00;3 1;1314;-200;13 5;1101;472;539;190;1374;1729;21 9;29 3;0
7568;;
04/04/2005;11.00.00;2 4;1163;-200;11 4;1027;353;604;179;1264;1269;24 3;23 7;0
7119;;
04/04/2005;12.00.00;2 4;1142;-200;12 4;1063;293;603;175;1241;1092;26 9;18 3;0
6406;;
04/04/2005;13.00.00;2 1;1003;-200;9 5;961;235;702;156;1041;770;28 3;13 5;0
5139;;
04/04/2005;14.00.00;2 2;1071;-200;11 9;1047;265;654;168;1129;816;28 5;13 1;0
5028;;

[6915 rows x 1 columns]


____________________________________
3.
[26]: # 3--> Filter the value of SO2 > 500. Use data frame (df. loc) methods.
# filtered_df1 = df1.loc[df1['SO2'] > 500]
print("\n(3) Filtered values where SO2 > 500:\n", "So2 column does not exist in␣
↪the given dataset")

print("____________________________________")
#submitted by shishir ranjan

(3) Filtered values where SO2 > 500:


So2 column does not exist in the given dataset
____________________________________
4.
[27]: # 4--> Use drop_duplicates() method to drop duplicate values from the dataset.
df1 = df1.drop_duplicates()

4
print("\n(4) Dataset after dropping duplicate values:\n", df1)
print("____________________________________")
#submitted by shishir ranjan

(4) Dataset after dropping duplicate values:

Date;Time;CO(GT);PT08.S1(CO);NMHC(GT);C6H6(GT);PT08.S2(NMHC);NOx(GT);PT08.S3(NOx
);NO2(GT);PT08.S4(NO2);PT08.S5(O3);T;RH;AH;;
10/03/2004;18.00.00;2 6;1360;150;11 9;1046;166;1056;113;1692;1268;13 6;48 9;0
7578;;
10/03/2004;20.00.00;2 2;1402;88;9 0;939;131;1140;114;1555;1074;11 9;54 0;0
7502;;
10/03/2004;21.00.00;2 2;1376;80;9 2;948;172;1092;122;1584;1203;11 0;60 0;0
7867;;
10/03/2004;22.00.00;1 6;1272;51;6 5;836;131;1205;116;1490;1110;11 2;59 6;0
7888;;
10/03/2004;23.00.00;1 2;1197;38;4 7;750;89;1337;96;1393;949;11 2;59 2;0
7848;;


04/04/2005;02.00.00;0 5;912;-200;1 5;544;69;959;55;1002;573;12 1;56 3;0
7927;;
04/04/2005;05.00.00;0 5;888;-200;1 3;528;77;1077;53;987;578;10 4;59 9;0
7550;;
04/04/2005;06.00.00;1 1;1031;-200;4 4;730;182;760;93;1129;905;9 5;63 1;0
7531;;
04/04/2005;11.00.00;2 4;1163;-200;11 4;1027;353;604;179;1264;1269;24 3;23 7;0
7119;;
04/04/2005;14.00.00;2 2;1071;-200;11 9;1047;265;654;168;1129;816;28 5;13 1;0
5028;;

[4941 rows x 1 columns]


____________________________________
5.
[28]: # 5--> Use the correlation method to show the relationship between columns (df.
↪corr).

correlation_matrix = df1.corr()
print("\n(5) Correlation matrix:\n", correlation_matrix)
print("____________________________________")
"""--------------------- Second question finish ------------------"""
#submitted by shishir ranjan

(5) Correlation matrix:


Empty DataFrame

5
Columns: []
Index: []
____________________________________
<ipython-input-28-4b6fa2a277c1>:2: FutureWarning: The default value of
numeric_only in DataFrame.corr is deprecated. In a future version, it will
default to False. Select only valid columns or specify the value of numeric_only
to silence this warning.
correlation_matrix = df1.corr()

[28]: '--------------------- Second question finish ------------------'

DATA ANALYTIC ASSIGNMENT 2ND QUESTION COMPLETED .


__________________________________________________________________
DATA ANALYTIC ASSIGMENT SUBMITTED BY SHSHIR RANJAN .
__________________________________________________________________

You might also like