EXP.NO.
: 1
DATE: Installing the data Analysis and Visualization Tools
AIM:
To install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power
BI.
PROGRAM 1:
# importing the pands package
import pandas as pd
# creating rows
hafeez = ['Hafeez', 19] aslan =
['Aslan', 21] kareem =
['Kareem', 18]
# pass those Series to the DataFrame #
passing columns as well
data_frame = pd.DataFrame([hafeez, aslan, kareem], columns = ['Name', 'Age']) #
displaying the DataFrame
print(data_frame)
OUTPUT
If you run the above program, you will get the following results. Name
Age
0 Hafeez 19
1 Aslan 21
2 Kareem 18
PROGRAM 2:
# importing the pyplot module to create graphs
import matplotlib.pyplot as plot
# importing the data using pd.read_csv() method data =
pd.read_csv('CountryData.IND.csv')
# creating a histogram of Time period
data['Time period'].hist(bins = 10)
1
OUTPUT
If you run the above program, you will get the following results.
<matplotlib.axes._subplots.AxesSubplot at 0x25e363ea8d0>
RESULT:
The installation of the data Analysis and Visualization tool: R/ Python /Tableau
Public/ Power BI are succesfully completed.
2
EXP.NO.: 2
DATE: Exploratory Data Analysis (EDA) On Datasets Like Email Data Set
AIM:
To perform Exploratory Data Analysis (EDA) on datasets like email data set. Export all
your emails as a dataset, import them inside a pandas data frame, visualize them and get
different insights from the data.
PROGRAM:
Create a CSV file with only the required attributes:
with open('mailbox.csv', 'w') as outputfile:
writer =csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])
for message in mbox:
writer.writerow ([message['subject'], message['from'],
message['date'],
message['to'],
message['X-Gmail-Labels'],
message['X-GM-THRID']
The output of the preceding code is as follows:
subject object
from object date object
to object label object
thread float64
dtype: object
def plot_number_perdhour_per_year(df, ax, label=None, dt=1,
smooth=False,
weight_fun=None, **plot_kwargs):
tod = df[df['timeofday'].notna()]['timeofday'].values year =
df[df['year'].notna()]['year'].values
Ty = year.max() - year.min() T
= tod.max() - tod.min() bins = int(T
/ dt)
3
if weight_fun is None:
weights = 1 / (np.ones_like(tod) * Ty * 365.25 / dt) else:
weights = weight_fun(df) if
smooth:
hst, xedges = np.histogram(tod, bins=bins, weights=weights); x =
np.delete(xedges, -1) + 0.5*(xedges[1] - xedges[0])
hst = ndimage.gaussian_filter(hst, sigma=0.75) f =
interp1d(x, hst, kind='cubic')
x = np.linspace(x.min(), x.max(), 10000) hst = f(x)
ax.plot(x, hst, label=label, **plot_kwargs) else: ax.hist(tod,
bins=bins, weights=weights, label=label,
**plot_kwargs); ax.grid(ls=':',
color='k')
orientation = plot_kwargs.get('orientation')
if orientation is None or orientation == 'vertical':
ax.set_xlim(0, 24)
ax.xaxis.set_major_locator(MaxNLocator(8))
ax.set_xticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")
for ts in ax.get_xticks()]); elif
orientation == 'horizontal':
ax.set_ylim(0, 24)
ax.yaxis.set_major_locator(MaxNLocator(8))
ax.set_yticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))),
"%H").strftime("%I %p")
for ts in ax.get_yticks()]);
4
OUTPUT
RESULT:
Thus the above program was executed succesfully.
5
EXP.NO.: 3
Working with Numpy arrays, Pandas data frames, Basic plots using
DATE: Matplotlib.
AIM:
To Work with Numpy arrays, Pandas data frames, Basic plots using Matplotlib.
PROGRAM 1:
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11) y = 2
*x+5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
OUTPUT
The above code should produce the following output −
PROGRAM 2:
import pandas as pd
import matplotlib.pyplot as plt
6
# creating a DataFrame with 2 columns
dataFrame = pd.DataFrame(
{
"Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
"Reg_Price": [2000, 2500, 2800, 3000, 3200, 3500],
"Units": [100, 120, 150, 170, 180, 200]
}
)
# plot a line graph
plt.plot(dataFrame["Reg_Price"], dataFrame["Units"])
plt.show()
OUTPUT
This will produce the following output −
RESULT:
Thus the above program was executed succesfully.
7
EXP.NO.: 4
Explore Various Variable And Row Filters In R For Cleaning Data.
DATE:
Apply Various Plot Features In R On Sample Data Sets And
Visualize.
AIM:
To explore various variable and row filters in R for cleaning data. Apply various
plot features in R on sample data sets and visualize.
PROCEDURE:
install.packages("data.table") # Install data.table package
library("data.table") # Load data.table
We also create some example data.
dt_all <- data.table(x = rep(month.name[1:3], each = 3), y =
rep(c(1, 2, 3), times = 3),
z = rep(c(TRUE, FALSE, TRUE), each = 3)) # Create data.table
head(dt_all)
Table 1
x y z
1 January 1 TRUE
2 January 2 TRUE
3 January 3 TRUE
4 February 1 FALSE
5 February 2 FALSE
6 February 3 FALSE
Filter Rows by Column Values
In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February
8
Table 2
x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE
Filter Rows by Column Values
In this example, I’ll demonstrate how to select all those rows of the example data for
which column x is equal to February. With the use of %in%, we can choose a set of values
of x. In this example, the set only contains one value.
dt_all[x %in% month.name[c(2)], ] # Rows where x is February
Table 2
x y z
1 February 1 FALSE
2 February 2 FALSE
3 February 3 FALSE
Filter Rows by Multiple Column Value
In the previous example, we addressed those rows of the example data for which one
column was equal to some value. In this example, we condition on the values of multiple
columns.
dt_all[x %in% month.name[c(2)] & y == 1, ] # Rows, where x is February and y is 1
Table 3
x y z
1 February 1 FALSE
RESULT:
Thus the above program was executed succesfully.
9
EXP.NO.: 5
DATE: Performing Time Series Analysis And Apply The Various Visualization
Techniques.
AIM:
To perform Time Series Analysis and apply the various visualization Techniques.
PROGRAM:
import matplotlib as mpl import
matplotlib.pyplot as plt import
seaborn as sns
import numpy as np
import pandas as pd
plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120}) #
Import as Dataframe
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'])
df.head()
Date Value
0 1991-07-01 3.526591
1 1991-08-01 3.180891
2 1991-09-01 3.252221
3 1991-10-01 3.611003
4 1991-11-01 3.565869
# Time series data source: fpp pacakge in R.
import matplotlib.pyplot as plt
df=pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Draw Plot
def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100):
plt.figure(figsize=(16,5), dpi=dpi)
plt.plot(x, y, color='tab:red')
plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel)
plt.show()
10
plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia
from 1992 to 2008.')
OUTPUT
RESULT:
Thus the above program was executed succesfully.
11
EXP.NO.: 6
DATE: Performing Data Analysis and representation on a Map using
various Map data sets with Mouse Rollover effect, user interaction.
AIM:
To perform Data Analysis and representation on a Map using various Map data sets with
Mouse Rollover effect, user interaction.
PROGRAM:
# 1. Draw the map background fig =
plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h',
lat_0=37.5, lon_0=-119,
width=1E6, height=1.2E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')
# 2. scatter city data, with color reflecting population # and
size reflecting area
m.scatter(lon, lat, latlon=True,
c=np.log10(population), s=area,
cmap='Reds', alpha=0.5)
# 3. create colorbar and legend plt.colorbar(label=r'$\
log_{10}({\rm population})$') plt.clim(3, 7)
# make legend with dummy points for a
in [100, 300, 500]:
plt.scatter([], [], c='k', alpha=0.5, s=a,
label=str(a) + ' km$^2$')
plt.legend(scatterpoints=1, frameon=False,
labelspacing=1, loc='lower left');
12
OUTPUT
RESULT:
Thus the above program was executed succesfully.
13
EXP.NO.: 7
DATE: Building Cartographic Visualization For Multiple Datasets Involving
Various Countries Of The World
AIM:
To build cartographic visualization for multiple datasets involving various countries of
the world.
PROGRAM:
alt.Chart(zipcodes).transform_filter (
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55'
). transform_calculate(
digit='datum.zip_code[0]'
).mark_line( strokeWidth
=0.5
).encode( longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N',
order='zip_code:O'
).project( type='albersUs
a'
).properties( width=
900, height=500
).configure_view( stroke
=None
)
OUTPUT
14
alt.layer(
alt.Chart(alt.topo_feature(usa, 'states')).mark_geoshape(
fill='#ddd', stroke='#fff', strokeWidth=1
),
alt.Chart(airports).mark_circle(size=9).encode( latitud
e='latitude:Q', longitude='longitude:Q',
tooltip='iata:N'
)
).project( type='albersUs
a'
).properties(
width=900,
height=500
).configure_view( stroke
=None
)
OUTPUT
RESULT:
Thus the above program was executed succesfully.
15
EXP.NO.: 8
DATE: Performing EDA on Wine Quality Data Set
AIM:
To perform EDA on Wine Quality Data Set.
PROGRAM:
#importing libraries
import numpy as np
import pandas as pd
importmatplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [4]: 1 #features in data
df.columns
Out [4]: Index([‘fixed acidity’, volatile acidity’, ‘citric acid’, ‘residual su gar’,
;chlorides’, ‘free sulfur dioxide’, total sulfur dioxide’, ‘den sity’,
‘pH’, ‘sulphates’, ‘alcohol’, ‘quality’],
dtype=’object’)
In [5]: #few datapoints
df.head( )
In [13]: sns.catplot(x=‘quality’,data=df,kind=‘count’)
Out [13]: <seaborn.axisgrid.facegrid at022b7de0dba8 ?? >
16
OUTPUT
RESULT:
Thus the above program was executed succesfully.
17
EXP.NO.: 9
DATE: Using A Case Study On A Data Set And Apply The Various EDA And
Visualization Techniques And Present An Analysis Report
AIM:
To use a case study on a data set and apply the various EDA and visualization techniques
and present an analysis report.
PROGRAM:
import datetime
import math
import pandas as pd import
random import radar
from faker import Faker fake =
Faker()
def generateData(n): listdata = []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30) delta = end
- start
for _ in range(n):
date = radar.random_datetime(start='2019-08-1', stop='2019-08-
30').strftime("%Y-%m-%d")
price = round(random.uniform(900, 1000), 4)
18
Date Price
2019-08-01 999.598900
2019-08-02 957.870150
2019-08-04 978.674200
2019-08-05 963.380375
2019-08-06 978.092900
2019-08-07 987.847700
2019-08-08 952.669900
2019-08-10 973.929400
2019-08-13 971.485600
2019-08-14 977.036200
listdata.append([date, price])
df = pd.DataFrame(listdata, columns = ['Date', 'Price']) df['Date']
= pd.to_datetime(df['Date'], format='%Y-%m-%d') df =
df.groupby(by='Date').mean()
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (14, 10) plt.plot(df)
19
OUTPUT
And the plotted graph looks something like this:
RESULT:
Thus the above program was executed succesfully.
20