Fds Lab Manual PDF
Fds Lab Manual PDF
2
DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF
Ex.No.1 NUMPY, SCIPY, JUPYTER, STATSMODELS AND PANDAS
PACKAGES
1a. Aim:
Problem Description
Python is an open-source object-oriented language. It has many features of which one is the wide
range of external packages. There are a lot of packages for installation and use for expanding
functionalities. These packages are a repository of functions in python script. NumPy is one such
package to ease array computations. To install all these python packages we use the pip- package
installer. Pip is automatically installed along with Python. We can then use pip in the command
line to install packages from PyPI.
NumPy
NumPy (Numerical Python) is an open-source library for the Python programming
language. It is used for scientific computing and working with arrays.
Apart from its multidimensional array object, it also provides high-level functioning tools for
working with arrays.
Prerequisites
3
output:
● Sample python program using numpy: import numpy as np # Creating array object
arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr)) #
Printing array dimensions (axes)
print("No. of dimensions: ",
arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size) # Printing type of
elements in array print("Array stores elements of
type: ", arr.dtype)
OUTPUT
Result:
Thus the features of NumPy package are downloaded,install nd explored.
4
1 b. Aim :
To download, install and explore the features of Jupyter packages.
Data Science:
Data science combines math and statistics, specialized programming, advanced analytics,
artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover
actionable insights hidden in an organization’s data.
Jupyter:
Jupyter Notebook is an open-source web application that allows you to create and share
documents that contain live code, equations, visualizations, and narrative text. Uses include data
cleaning and transformation, numerical simulation, statistical modeling, data visualization,
machine learning, and much more.
Jupyter has support for over 40 different programming languages and Python is one of them.
Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the Jupyter
Notebook itself.
Procedure:
To install Jupyter using pip, we need to first check if pip is updated in our system.
Use the following command to update pip:
5
python -m pip install --upgrade pip
After updating the pip version, follow the instructions provided below to install Jupyter:
∙ Finished Installation:
6
jupyter notebook
7
Click New and select python 3(ipykernal) and type the following
program. Click run to execute the program.
Running the Python program:
Python code:
Program to find the area of a triangle #
Python Program to find the area of triangle a
=5b=6c=7
# calculate the semi-perimeter
s = (a + b + c) / 2 #
calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5 print('The
area of the triangle is %0.2f' %area)
Output:
Result:
Thus the features of Jupyter packages are downloaded,install and explored.
8
1 c Aim:
To download, install and explore the features of Scipy package.
Problem Description
Scipy is a python library that is useful in solving many mathematical equations and
algorithms. It is designed on the top of Numpy library that gives more extension of finding scientific
mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition, etc.
Using its high-level functions will significantly reduce the complexity of the code and helps in better
analyzing the data.
output:
9
output:
Result:
Thus the features of scipy package are downloaded, install and explored.
10
1 d.Aim:
To downloaded, install and explored the features of panda package.
Problem Description
The library does not come included with a regular install of Python. To use it, you must install
the Pandas framework separately.
As long as you have a newer version of Python installed (> Python 3.4), pip will be installed on
your computer along with Python by default.
However, if you’re using an older version of Python, you will need to install pip on your
computer before installing Pandas.
11
on the terminal. This should launch the pip installer. The required files will be downloaded, and
Pandas will be ready to run on your computer.
Sample program
12
Output:
Result:
Thus the features of Panda packages are downloaded install and explored.
13
1e. Aim:To download, install and explore the features of Statsmodals package.
Installing Statsmodels
14
Installation of statsmodels
Now for installing statsmodels in our system, Open the Command Prompt, type the
following command and click on 'Enter'.
pip install statsmodels
Output
Here, we will perform OLS(Ordinary Least Squares) regression, in this technique we will try to
minimize the net sum of squares of difference between the calculated value and observed value.
15
Program
OUTPUT
Result:
Thus the features of Statsmodals package are downloaded, install and explored.
16
Ex.No.2 WORKING WITH NUMPY ARRAYS
Aim :
Write a python program to show the woking of NumPy Arrays in Python.
2a) Use Numpy array to demonstrate basic array characteristics
b) Create Numpy array using list and tuple
c) Apply basic operations (+,_,*./) and find the transpose of the matrix
d) Perform sorting operation with Numpy arrays
Problem Description
Example 1:
Write a python program to demonstrate the basic NumPy array
characteristics
import numpy as np
17
# Printing type of elements in array print("Array
stores elements of type: ", arr.dtype)
Output :
Array is of type: <class 'numpy.ndarray'>
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int64
2. Array creation: There are various ways to create arrays in NumPy.
● For example, you can create an array from a regular Python list or tuple using the array
function. The type of the resulting array is deduced from the type of the elements in the
sequences.
● Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy
offers several functions to create arrays with initial placeholder content. These minimize
the necessity of growing arrays, an expensive operation. For example: np.zeros, np.ones,
np.full, np.empty, etc.
● To create sequences of numbers, NumPy provides a function analogous to range that returns
arrays instead of lists.
● arange: returns evenly spaced values within a given interval. step size is specified.
● linspace: returns evenly spaced values within a given interval. num no. of elements are
returned.
● Reshaping array: We can use reshape method to reshape an array. Consider an array with
shape (a1, a2, a3, …, aN). We can reshape and convert it into another array with shape (b1,
b2, b3, …, bM). The only required condition is: a1 x a2 x a3 … x aN = b1 x b2 x b3 … x bM
. (i.e original size of array remains unchanged.)
● Flatten array: We can use flatten method to get a copy of array collapsed into one
dimension. It accepts order argument. Default value is ‘C’ (for row-major order). Use ‘F’
for column major order.
18
Example 2:
import numpy as np
newarr = arr.reshape(2, 2, 3)
19
# Flatten array arr = np.array([[1,
2, 3], [4, 5, 6]]) flarr =
arr.flatten()
OUTPUT
Array created using passed list:
[[ 1. 2. 4.]
[ 5. 8. 7.]]
[ 0. 0. 0. 0.]]
An array initialized with all 6s. Array type is complex:
[[ 6.+0.j 6.+0.j 6.+0.j]
[ 6.+0.j 6.+0.j 6.+0.j]
[ 6.+0.j 6.+0.j 6.+0.j]]
A random array:
[[ 0.46829566 0.67079389]
[ 0.09079849 0.95410464]]
20
A sequential array with 10 values between 0 and 5:
[ 0. 0.55555556 1.11111111 1.66666667 2.22222222 2.77777778
3.33333333 3.88888889 4.44444444 5. ]
Original array:
[[1 2 3 4]
[5 2 4 2]
[1 2 0 1]]
Reshaped array:
[[[1 2 3]
[4 5 2]]
[[4 2 1]
[2 0 1]]]
Original array:
[[1 2 3]
[4 5 6]]
Fattened array:
[1 2 3 4 5 6]
21
3. Basic operations:
Program 3:
import numpy as np
a = np.array([1, 2, 5, 3])
# transpose of array
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
22
Output
Adding 1 to every element: [2 3 6 4]
Subtracting 3 from each element: [-2 -1 2 0]
Multiplying each element by 10: [10 20 50 30]
Squaring each element: [ 1 4 25 9]
Doubled each element of original array: [ 2 4 10 6]
Original array:
[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]
23
4. Sorting array: There is a simple np.sort method for sorting NumPy arrays. Let’s explore it a
bit.
Program 4:
import numpy as np
a = np.array([[1, 4, 2],
[3, 4, 6],
[0, -1, 5]])
24
OUTPUT
Array elements in sorted order:
[-1 0 1 2 3 4 4 5 6]
Row-wise sorted array:
[[ 1 2 4]
[ 3 4 6]
[-1 0 5]]
Column wise sort by applying merge-sort:
[[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]
Result:
Thus the python program woking of NumPy Arrays in Python are executed successfully.
25
Ex.No.3 WORKING WITH PANDAS DATA FRAMES
Aim:
Write a python program to work with Panda data frames
Pandas
Pandas is an open-source library that is built on top of NumPy library. It is a Python
package that offers various data structures and operations for manipulating numerical data and
time series. It is mainly popular for importing and analyzing data much easier. Pandas is fast and
it has high-performance & productivity for users.
Pandas DataFrame
In the real world, a Pandas DataFrame will be created by loading the datasets from
existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can
be created from the lists, dictionary, and from a list of dictionary etc.
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns). A Data frame is a two-dimensional data
structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame
consists of three principal components, the data, rows, and columns.
26
Creating a dataframe using List:
In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These
three function will help in iteration over rows.
Program
import pandas as pd
print(df)
# Create dataframe df =
pd.DataFrame(Data)
27
# Print the output.
print(df)
print(df)
# iterating over rows using iterrows() function
for i, j in df.iterrows():
print(i, j)
print()
28
OUTPUT
Empty dataframe
Empty DataFrame
Columns: []
Index: []
0 name aparna
Degree MBA
Score 90
Name: 0, dtype: object
1 name pankaj
Degree BCA
Score 40
Name: 1, dtype: object
2 name sudhir
Degree M.Tech
Score 80
Name: 2, dtype: object
3 name Geeku
Degree MBA
Score 98
Name: 3, dtype: object
Result: Thus the python program to work with Panda data frames are executed.
29
READING DATA FROM TEXT FILES, EXCEL AND THE
Ex.No.4 WEB AND EXPLORING VARIOUS COMMANDS FOR DOING
DESCRIPTIVE ANALYTICS ON THE IRIS DATA SET
1a.Aim:
Reading data from text files and exploring various commands for doing descriptive
analytics on the Iris data set.
Program 1
To read a csv fie
import pandas as pd
30
OUTPUT
Checking Duplicates
Let’s see if our dataset contains any duplicates or not. Pandas drop_duplicates() method helps in
removing duplicates from the data frame.
Example:
data = df.drop_duplicates(subset ="Species",)
data
Output
Result:
Thus the reading data from text files and exploring various commands for doing
descriptive analytics on the Iris data set.
31
USE THE DIABETES DATA SET FROM UCI AND PIMA
Ex.No.5
INDIANS DIABETES
5 a)Aim:
Program:
import numpy as np import
matplotlib.pyplot as plt import
pandas as pd from scipy.stats
import skew from scipy.stats
import kurtosis import
statistics
df = pd.read_csv('C:/Users/HP/OneDrive/Desktop/RESEARCH/Data/Pima2.csv')
print('THE SHAPE OF THE DATASET IS \n',df.shape) print('THE DATA
TYPES ARE :\n',df.dtypes) print('THE DESCRIPTION OF THE DATASET
IS:\n',df.describe().T) df.plot(kind='density', subplots=True, layout=(3,3),
sharex=False)
32
print('VARIANCE :',statistics.variance(preg))
print('sTANDARS DEVIATION :',statistics.stdev(preg))
if (skew(preg)>0) :
print("POSITIVE SKEWNESS \n")
elif (skew(preg)<0) :
print("NEGATIVE SKEWNESS \n")
else :
print("NO SKEWNESS \n")
print('KURTOSIS',kurtosis(preg))
print('THE FREQUENCY OF GLOCOSE IS:\
n',df['glu'].value_counts()) G = np.array(df['glu']) print('MEAN
:',statistics.mean(G))
print('MEDIAN :',statistics.median(G))
print('MODE :',statistics.multimode(G))
print('VARIANCE :',statistics.variance(G))
print('sTANDARS DEVIATION :',statistics.stdev(G)) if
(skew(G)>0) : print("POSITIVE SKEWNESS \n")
elif (skew(G)<0) :
print("NEGATIVE SKEWNESS \n")
else :
print("NO SKEWNESS \n")
print('KURTOSIS',kurtosis(G))
33
Unnamed: 0 int64
npreg int64 glu
int64 bp
float64 skin
float64 bmi
float64 ped
float64 age
int64 type
object dtype: object
[8 rows x 8 columns]
34
THE FREQUENCY OF PREGNANCIES IS:
1 51
0 44
2 41
4 35
3 26
5 22
6 19
7 17
8 15
9 9
10 8
12 6
13 3
14 2
11 2
Name: npreg, dtype: int64
MEAN :3
MEDIAN : 3.0
35
MODE : [1]
VARIANCE : 10
sTANDARS DEVIATION : 3.1622776601683795
POSITIVE SKEWNESS
KURTOSIS 0.20897562301271444
THE FREQUENCY OF GLOCOSE IS:
100 8
125 8
111 6
95 6
139 6
..
152 1
149 1
135 1
198 1
56 1
Name: glu, Length: 108, dtype: int64
MEAN : 123
MEDIAN : 121.0
MODE : [100, 125]
VARIANCE : 900
sTANDARS DEVIATION : 30.0
POSITIVE SKEWNESS
KURTOSIS -0.19604758469298522
36
Result:
Thus the perform Univariate analysis like Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis in pima Indians dataset is executed .
5 b) Aim:
To perform Bivariate analysis with Linear regression modeling using Pima Indians
dataset .
Program:
# Linear Regression import numpy as np import
matplotlib.pyplot as plt import pandas as pd from
sklearn.model_selection import train_test_split from
sklearn.linear_model import LinearRegression
# import the library
diabetes_dataset =
pd.read_csv('C:/Users/HP/OneDrive/Desktop/RESEARCH/Data/diabetes.csv')
print('THE SHAPE OF THE DATASET IS \n',diabetes_dataset.shape) print('THE
DATA TYPES ARE :\n',diabetes_dataset.dtypes) print('THE DESCRIPTION OF
THE DATASET IS:\n',diabetes_dataset.describe().T) max=0 ind=0 col=0 c=[0 for x
in range(diabetes_dataset.columns.size)] for i in range (0,8):
for j in range (0,8):
if (i==j):
37
continue
c[i]=diabetes_dataset.iloc[:,i].corr(diabetes_dataset.iloc[:,j]) if
(max<c[i]): max=c[i] ind=i col=j print('maxindex=,col ',ind,col)
return slope*preg+intercept
38
plt.scatter(X_test, y_test, color = "red") plt.plot(X_train,
regressor.predict(X_train), color = "green") plt.title("Preg
vs Age (Testing set)") plt.xlabel("Preg")
plt.ylabel("Age")
plt.show()
diabetes_dataset =
pd.read_csv('C:/Users/HP/OneDrive/Desktop/RESEARCH/Data/diabetes.csv')
print('THE SHAPE OF THE DATASET IS \n',diabetes_dataset.shape) print('THE
DATA TYPES ARE :\n',diabetes_dataset.dtypes) print('THE DESCRIPTION OF
THE DATASET IS:\n',diabetes_dataset.describe().T) max=0 ind=0 c=[0 for x in
range(diabetes_dataset.columns.size)] for i in range (0,8):
c[i]=diabetes_dataset.iloc[:,i].corr(diabetes_dataset.iloc[:,8]) if (max<c[i]): max=c[i]
ind=i
X = diabetes_dataset.iloc[:,ind].values.reshape(-1, 1) #independent variable array
Y = diabetes_dataset.iloc[:,8].values #dependent variable vector
#splitting
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2,random_state=0)
#fitting the model from sklearn.linear_model import
LogisticRegression model = LogisticRegression()
model.fit(X_train,y_train) train_acc =
model.score(X_train,y_train) print("The Accuracy
39
for Training Set is {}".format(train_acc*100))
y_pred = model.predict(X_test)
40
SkinThickness 768.0 20.536458 ... 32.00000 99.00
Insulin 768.0 79.799479 ... 127.25000 846.00
BMI 768.0 31.992578 ... 36.60000 67.10
DiabetesPedigreeFunction 768.0 0.471876 ... 0.62625 2.42
Age 768.0 33.240885 ... 41.00000 81.00
Outcome 768.0 0.348958 ... 1.00000 1.00
[9 rows x 8 columns]
maxindex=,col 0 7
Regression intercept [26.29935069]
Regression coefficient [[1.88280735]]
predicted age plot [[45.12742415]]
predicted age model pred [[45.12742415]]
Actual Predicted
0 22 28.182158
1 23 30.064965
2 25 33.830580
3 51 35.713387
4 31 26.299351
.. ... ...
149 29 30.064965
150 28 33.830580
151 22 33.830580
152 24 31.947773
153 24 28.182158
[154 rows x 2 columns]
Mean absolute error: 6.77
Mean squared error: 77.77
41
Root mean squared error: 8.82
42
Insulin 768.0 79.799479 ... 127.25000 846.00
BMI 768.0 31.992578 ... 36.60000 67.10
DiabetesPedigreeFunction 768.0 0.471876 ... 0.62625 2.42
Age 768.0 33.240885 ... 41.00000 81.00
Outcome 768.0 0.348958 ... 1.00000 1.00
[9 rows x 8 columns]
The Accuracy for Training Set is 73.61563517915309
Accuracy of logistic regression classifier on test set: 0.79
[[96 11]
[22 25]]
The outcome for glucose level is : [1]
Result:
Thus the perform Bivariate analysis with Linear regression modeling using Pima Indians
dataset are implemented .
43
5 c)Aim:
diabetes_dataset =
pd.read_csv('C:/Users/HP/OneDrive/Desktop/RESEARCH/Data/diabetes.csv')
print('THE SHAPE OF THE DATASET IS \n',diabetes_dataset.shape) print('THE
DATA TYPES ARE :\n',diabetes_dataset.dtypes) print('THE DESCRIPTION OF
THE DATASET IS:\n',diabetes_dataset.describe().T) max=0
col=0
c1 = [[0] * 4 for i in range(8)] c=[0 for x in
range(diabetes_dataset.columns.size)] for i in range (0,8):
c[i]=diabetes_dataset.iloc[:,i].corr(diabetes_dataset.iloc[:,8]) if
(c[i]>0.25): c1[col]=i col=col+1
df=diabetes_dataset.iloc[:,c1[0:col]]
X = df.iloc[:,:col].values #independent variable array
Y = diabetes_dataset.iloc[:,8].values #dependent variable vector
#splitting
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2,random_state=0)
#fitting the model from sklearn.linear_model import
LogisticRegression model = LogisticRegression()
model.fit(X_train,y_train) train_acc =
model.score(X_train,y_train) print("The Accuracy for Training Set
is {}".format(train_acc*100)) y_pred = model.predict(X_test)
44
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(model.score(X_test,
y_test)))
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix) out =
model.predict([[150,34]]) print('The outcome of
glucose and bmi :', out)
Output of above coding:
THE SHAPE OF THE DATASET IS
(768, 9)
THE DATA TYPES ARE :
Pregnancies int64
Glucose int64
BloodPressure int64
SkinThickness int64
Insulin int64
BMI float64
DiabetesPedigreeFunction float64
Age int64
Outcome int64
dtype: object
THE DESCRIPTION OF THE DATASET IS:
count mean ... 75% max
Pregnancies 768.0 3.845052 ... 6.00000 17.00
Glucose 768.0 120.894531 ... 140.25000 199.00
BloodPressure 768.0 69.105469 ... 80.00000 122.00
SkinThickness 768.0 20.536458 ... 32.00000 99.00
Insulin 768.0 79.799479 ... 127.25000 846.00
BMI 768.0 31.992578 ... 36.60000 67.10
45
DiabetesPedigreeFunction 768.0 0.471876 ... 0.62625 2.42
Age 768.0 33.240885 ... 41.00000 81.00
Outcome 768.0 0.348958 ... 1.00000 1.00
[9 rows x 8 columns]
The Accuracy for Training Set is 76.2214983713355
Accuracy of logistic regression classifier on test set: 0.79
[[95 12]
[21 26]]
The outcome of glucose and bmi : [1]
Result:
Thus the perform multiple regression analysis using multivariate logistic
regression on UCI diabetes dataset are executed .
46
APPLY AND EXPLORE VARIOUS PLOTTING
FUNCTIONSON UCI DATA SETS.
A. NORMAL CURVES
Ex.No.6 B. DENSITY AND CONTOUR PLOTS
C. CORRELATION AND SCATTER PLOTS
D. HISTOGRAMS
E. THREE DIMENSIONAL PLOTTING
Aim:
To apply and explore various plotting functions on UCI data sets.
A. Normal curves
B. Density and contour plots
C. Correlation and scatter plots
D. Histograms
E. Three-dimensional plottingA.Normal curves
importnumpy as np
importmatplotlib.pyplot as plt
fromscipy.stats importnorm
importstatistics
47
Output
48
B)Density and contour plots:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, colors='black');
Output
Notice that by default when a single color is used, negative values are represented by dashed
lines, and positive values by solid lines. Alternatively, the lines can be color-coded by specifying
a colormap with the cmap argument. Here, we'll also specify that we want more lines to be drawn
—20 equally spaced intervals within the data range:
plt.contour(X, Y, Z, 20, cmap='RdGy');
49
plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower',
cmap='RdGy')
plt.colorbar()
plt.axis(aspect='image');
50
51
C.Correlation and Scatterplots
1. Preliminaries
import pandas as pd
con = pd.read_csv('Data/ConcreteStrength.csv')
con
2. Renaming columns
list(con.columns)
['No',
'Cement',
'Slag',
'Fly ash',
'Water',
'SP',
'Coarse Aggr.',
'Fine Aggr.',
'Air Entrainment',
'Compressive Strength (28-day)(Mpa)']
52
con.rename(columns={'Fly ash': 'FlyAsh', 'Coarse Aggr.': "CoarseAgg",
'Fine Aggr.': 'FineAgg', 'Air Entrainment': 'AirEntrain',
'Compressive Strength (28-day)(Mpa)': 'Strength'}, inplace=True)
con.head()
con['AirEntrain'] = con['AirEntrain'].astype('category')
53
con.describe(include='category')
3. Scatterplots
Scatterplots are a fundamental graph type—much less complicated than histograms and
boxplots. As such, we might use the Mathplotlib library instead of the Seaborn library. But
since we have already used Seaborn, I will stick with it here. Just know that there are many
ways to create scatterplots and other basic graphs in Python.
To create a bare-bones scatterplot, we must do four things:
4. Adding labels
54
ax = sns.scatterplot(x="FlyAsh", y="Strength", data=con)
ax.set_title("Concrete Strength vs. Fly ash")
ax.set_xlabel("Fly ash");
55
A graphics “party trick” made fashionable by tools like Tableau is to use color, size, or
some other visual cue to add a third dimension to a two-dimensional scatterplot. In the case of
color (or “hue” in Seaborn terminology), this third dimension need to be a non-continuous
variable. This is because the palette of colors available has a finite number of options.
56
D)Histogram
importmatplotlib.pyplot as plt importnumpy
as np frommatplotlib importcolors
frommatplotlib.ticker importPercentFormatter
# Creating dataset
np.random.seed(23685752)
N_points =10000 n_bins
=20
# Creating distribution x
=np.random.randn(N_points) y =.8**x
+np.random.randn(10000) +25 legend
=['distribution'] # Creating histogram
fig, axs =plt.subplots(1, 1,
figsize=(10, 7),
tight_layout =True) # Remove axes
splines
fors in['top', 'bottom', 'left', 'right']:
axs.spines[s].set_visible(False)
# Remove x, y ticks
axs.xaxis.set_ticks_position('none')
axs.yaxis.set_ticks_position('none')
# Add x, y gridlines
axs.grid(b =True, color ='grey',
linestyle ='-.', linewidth =0.5,
alpha =0.6)
57
# Creating histogram
N, bins, patches =axs.hist(x, bins =n_bins)
Output:
58
E)Three dimensional plotting
Three-dimensional plots are enabled by importing the mplot3d toolkit, included with the main
Matplotlib installation:
from mpl_toolkits import mplot3d
Once this submodule is imported, a three-dimensional axes can be created by passing the
keyword projection='3d' to any of the normal axes creation routines:
%matplotlib inline
importnumpyasnp
importmatplotlib.pyplotasplt
fig = plt.figure() ax =
plt.axes(projection='3d')
The most basic three-dimensional plot is a line or collection of scatter plot created from
sets of (x, y, z) triples. In analogy with the more common two-dimensional plots discussed
earlier, these can be created using the ax.plot3D and ax.scatter3D functions. The call signature
for these is nearly identical to that of their two-dimensional counterparts, so you can refer to
Simple Line Plots and Simple Scatter Plots for more information on controlling the output.
Here we'll plot a trigonometric spiral, along with some points drawn randomly near the line:
In [4]: ax = plt.axes(projection='3d')
59
1000) xline = np.sin(zline) yline =
np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')
x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');
60
Output:
In [7]:
ax.view_init(60, 35)
fig
61
OUTPUT
plt.axes(projection='3d')
ax.plot_wireframe(X, Y, Z,
color='black')
ax.set_title('wireframe');
62
OUTPUT
A surface plot is like a wireframe plot, but each face of the wireframe is a filled polygon. Adding
a colormap to the filled polygons can aid perception of the topology of the surface being
visualized:
In [8]:
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1,
cstride=1, cmap='viridis',
edgecolor='none') ax.set_title('surface');
63
OUTPUT
Surface Triangulations
For some applications, the evenly sampled grids required by the above routines is overly
restrictive and inconvenient. In these situations, the triangulation-based plots can be very useful.
What if rather than an even draw from a Cartesian or a polar grid, we instead have a set of
random draws?
In [9]:
theta = 2 * np.pi *
np.random.random(1000) r = 6 *
np.random.random(1000) x = np.ravel(r
* np.sin(theta)) y = np.ravel(r *
np.cos(theta)) z = f(x, y)
We could create a scatter plot of the points to get an idea of the surface we're sampling from:
In [10]:
64
ax = plt.axes(projection='3d') ax.scatter(x, y, z,
OUTPUT
This leaves a lot to be desired. The function that will help us in this case is ax.plot_trisurf,
which creates a surface by first finding a set of triangles formed between adjacent points
(remember that x, y, and z here are one-dimensional arrays):
In [11]:
ax = plt.axes(projection='3d')
ax.plot_trisurf(x, y, z,
cmap='viridis', edgecolor='none');
65
OUTPUT
Result:
66
Thus the three dimensional plotting on UCI Datasets are executed.
Aim:
longitude
67
o drawmapscale(): Draw a linear scale on the map
relief image onto the map o etopo(): Draw an etopo relief image
the map
Installation
Step 1: Use the Anaconda Navigator to install basemap. Go to start and click Anaconda
command prompt.
Step 2: Before installing Basemap, be sure to install pillow package. Install the pillow package
using the command line pip install pillow
Step 3: Next step is to install the Basemap using the following command
pip install basemap
The anaconda command prompt will look like
Step 4: After successfully installing basmape package navigate to jupyter notebook using the
following command
68
jupyter notebook
Step 4: The above cmd will open a new webpage with address http://localhost:8888/tree.
Step 6: Start the program for visualizing geographical data using basemap. Click run to run the
pogram.
Some of these map-specific methods are:
69
PROGRAM
1. Simple Maps and color it
To start, import Basemap as well as matplotlib and numpy:
import numpy as np
warnings.filterwarnings("ignore",category=matplotlib.cbook.mplDeprecation)
Basemap?
70
OUTPUT:
71
1a. Coding for Coloring fig = plt.figure(num=None, figsize=(12,
8) ) m = Basemap(projection='merc',llcrnrlat=-
80,urcrnrlat=80,llcrnrlon=-
180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,91.,30.),labels=[True,True,False,False],dashes=[2,2])
m.drawmeridians(np.arange(-180.,181.,60.),labels=[False,False,False,True],dashes=[2,2])
m.drawmapboundary(fill_color='lightblue')
plt.title("Mercator Projection")
Output
72
1b. Same sequence of commands but with a different projection:
Basemap(projection='moll',lon_0=0,resolution='c')
m.drawcoastlines()
m.fillcontinents(color='purple',lake_color='lightblue')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,91.,30.),labels=[True,True,False,False],dashes=[2,2])
m.drawmeridians(np.arange(-180.,181.,60.),labels=[False,False,False,False],dashes=[2,2])
m.drawmapboundary(fill_color='lightblue')
plt.title("Mollweide Projection");
Output
73
2 a. Create a map centered on North America with lines showing the country and state
boundaries as well as rivers:
m=
Basemap(width=6000000,height=4500000,resolution='c',projection='aea',lat_1=35.,lat_2=45,lon
_0=-100,lat_0=40)
m.drawcoastlines(linewidth=0.5)
m.fillcontinents(color='tan',lake_color='lightblue')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,91.,15.),labels=[True,True,False,False],dashes=[2,2])
m.drawmeridians(np.arange(-180.,181.,15.),labels=[False,False,False,True],dashes=[2,2])
m.drawmapboundary(fill_color='lightblue')
74
75
2b. Use a different map projection, zoom-in to North America and plot the location
of Seattle fig = plt.figure(figsize=(8, 8)) m = Basemap(projection='lcc', resolution=None,
width=8E6, height=8E6, lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5) # Map
(long, lat) to (x, y) for plotting x, y =
m(-122.3, 47.6)
Output
76
2. Map Projections
The Basemap package implements several dozen such projections, all referenced by a
short format code. Here we'll briefly demonstrate some of the more common ones.
We'll start by defining a convenience routine to draw our world map along with the
longitude and latitude lines: from itertools import chain def draw_map(m, scale=0.2):
# draw a shaded-relief image
m.shadedrelief(scale=scale)
# lats and longs are returned as a dictionary lats =
m.drawparallels(np.linspace(-90, 90, 13)) lons =
m.drawmeridians(np.linspace(-180, 180, 13))
# keys contain the plt.Line2D instances lat_lines =
chain(*(tup[1][0] for tup in lats.items())) lon_lines =
chain(*(tup[1][0] for tup in lons.items())) all_lines =
chain(lat_lines, lon_lines)
# cycle through these lines and set the desired style
for line in all_lines:
line.set(linestyle='-', alpha=0.3, color='w')
Cylindrical projections
The simplest of map projections are cylindrical projections, in which lines of constant
latitude and longitude are mapped to horizontal and vertical lines, respectively. This type
of mapping represents equatorial regions quite well, but results in extreme distortions
near the poles. The spacing of latitude lines varies between different cylindrical
projections, leading to different conservation properties, and different distortion near the
poles. In the following figure we show an example of the equidistant cylindrical
projection, which chooses a latitude scaling that preserves distances along meridians.
Other cylindrical projections are the Mercator (projection='merc') and the cylindrical
equal area (projection='cea') projections.
77
fig = plt.figure(figsize=(8, 6), edgecolor='w') m
= Basemap(projection='cyl', resolution=None,
llcrnrlat=-90, urcrnrlat=90, llcrnrlon=-180,
urcrnrlon=180, )
draw_map(m)
OUTPUT
78
OUTPUT
Conic projections
A Conic projection projects the map onto a single cone, which is then unrolled. This can lead to
very good local properties, but regions far from the focus point of the cone may become very
distorted. One example of this is the Lambert Conformal Conic projection (projection='lcc'),
which we saw earlier in the map of North America. It projects the map onto a cone arranged in
such a way that two standard parallels (specified in Basemap by lat_1 and lat_2) have well
represented distances, with scale decreasing between them and increasing outside of them. Other
useful conic projections are the equidistant conic projection (projection='eqdc') and the Albers
equal-area projection (projection='aea'). Conic projections, like perspective projections, tend to
be good choices for representing small to medium patches of the globe.
79
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,
lon_0=0, lat_0=50, lat_1=45, lat_2=55,
width=1.6E7, height=1.2E7) draw_map(m)
OUTPUT
Result:
Thus visualizing geographic data with basemap is implemented.
80