0% found this document useful (0 votes)
30 views80 pages

Fds Lab Manual PDF

The document provides a detailed guide on downloading, installing, and exploring various Python packages including NumPy, SciPy, Jupyter, Pandas, and Statsmodels. Each section outlines the purpose of the package, installation commands, and sample code demonstrating its features. The document emphasizes the importance of these libraries in scientific computing, data analysis, and machine learning.

Uploaded by

Archana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views80 pages

Fds Lab Manual PDF

The document provides a detailed guide on downloading, installing, and exploring various Python packages including NumPy, SciPy, Jupyter, Pandas, and Statsmodels. Each section outlines the purpose of the package, installation commands, and sample code demonstrating its features. The document emphasizes the importance of these libraries in scientific computing, data analysis, and machine learning.

Uploaded by

Archana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 80

1

2
DOWNLOAD, INSTALL AND EXPLORE THE FEATURES OF
Ex.No.1 NUMPY, SCIPY, JUPYTER, STATSMODELS AND PANDAS
PACKAGES

1a. Aim:

To download, install and explore the features of NumPy package.

Problem Description
Python is an open-source object-oriented language. It has many features of which one is the wide
range of external packages. There are a lot of packages for installation and use for expanding
functionalities. These packages are a repository of functions in python script. NumPy is one such
package to ease array computations. To install all these python packages we use the pip- package
installer. Pip is automatically installed along with Python. We can then use pip in the command
line to install packages from PyPI.
NumPy
NumPy (Numerical Python) is an open-source library for the Python programming
language. It is used for scientific computing and working with arrays.
Apart from its multidimensional array object, it also provides high-level functioning tools for
working with arrays.
Prerequisites

● Access to a terminal window/command line


● A user account with sudo privileges
● Python installed on your system
Downloading and installing Numpy:
Python NumPy is a general-purpose array processing package that provides tools for
handling n-dimensional arrays. It provides various computing tools such as comprehensive
mathematical functions, linear algebra routines.Use the below command to install NumPy:

pip install numpy

3
output:

● Sample python program using numpy: import numpy as np # Creating array object
arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr)) #
Printing array dimensions (axes)
print("No. of dimensions: ",
arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size) # Printing type of
elements in array print("Array stores elements of
type: ", arr.dtype)
OUTPUT

Result:
Thus the features of NumPy package are downloaded,install nd explored.

4
1 b. Aim :
To download, install and explore the features of Jupyter packages.

Data Science:
Data science combines math and statistics, specialized programming, advanced analytics,
artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover
actionable insights hidden in an organization’s data.

Jupyter:
Jupyter Notebook is an open-source web application that allows you to create and share
documents that contain live code, equations, visualizations, and narrative text. Uses include data
cleaning and transformation, numerical simulation, statistical modeling, data visualization,
machine learning, and much more.
Jupyter has support for over 40 different programming languages and Python is one of them.
Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the Jupyter
Notebook itself.

Procedure:

PIP is a package management system used to install and manage software


packages/libraries written in Python. These files are stored in a large “on-line repository” termed
as Python Package Index (PyPI). pip uses PyPI as the default source for packages and their
dependencies.

Installing Jupyter Notebook using pip:

To install Jupyter using pip, we need to first check if pip is updated in our system.
Use the following command to update pip:

5
python -m pip install --upgrade pip

After updating the pip version, follow the instructions provided below to install Jupyter:

Command to install Jupyter:

python -m pip install jupyter

∙ Finished Installation:

Use the following command to launch Jupyter using command-line:

6
jupyter notebook

Launching Jupyter Notebook

7
Click New and select python 3(ipykernal) and type the following
program. Click run to execute the program.
Running the Python program:
Python code:
Program to find the area of a triangle #
Python Program to find the area of triangle a
=5b=6c=7
# calculate the semi-perimeter
s = (a + b + c) / 2 #
calculate the area
area = (s*(s-a)*(s-b)*(s-c)) ** 0.5 print('The
area of the triangle is %0.2f' %area)

Output:

Result:
Thus the features of Jupyter packages are downloaded,install and explored.

8
1 c Aim:
To download, install and explore the features of Scipy package.

Problem Description

Scipy is a python library that is useful in solving many mathematical equations and
algorithms. It is designed on the top of Numpy library that gives more extension of finding scientific
mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition, etc.
Using its high-level functions will significantly reduce the complexity of the code and helps in better
analyzing the data.

Downloading and Installing Scipy:

pip use the below command to install Scipy package on Windows:

pip install scipy

output:

Sample python code using Scipy: Type

the program in Jupyter notebook

from scipy import special


a = special.exp10(3)
print(a) b =
special.exp2(3) print(b) c
= special.sindg(90)
print(c) d =
special.cosdg(45)
print(d)

9
output:

Result:

Thus the features of scipy package are downloaded, install and explored.

10
1 d.Aim:
To downloaded, install and explored the features of panda package.

Problem Description

Pandas is one of the most popular open-source frameworks available for


Python. It is among the fastest and most easy-to-use libraries for data analysis and manipulation.
Pandas dataframes are some of the most useful data structures available in any library. It has uses
in every data-intensive field, including but not limited to scientific computing, data science, and
machine learning.

The library does not come included with a regular install of Python. To use it, you must install
the Pandas framework separately.

Installing Pandas on Windows

There are two ways of installing Pandas on Windows.

Method #1: Installing with pip


It is a package installation manager that makes installing Python libraries and frameworks
straightforward.

As long as you have a newer version of Python installed (> Python 3.4), pip will be installed on
your computer along with Python by default.

However, if you’re using an older version of Python, you will need to install pip on your
computer before installing Pandas.

Step #1: Launch Command Prompt


Press the Windows key on your keyboard or click on the Start button to open the start menu.
Type cmd, and the Command Prompt app should appear as a listing in the start menu.

Step #2: Enter the Required Command


After you launch the command prompt, the next step in the process is to type in the required
command to initialize pip installation.

Enter the command

pip install pandas

11
on the terminal. This should launch the pip installer. The required files will be downloaded, and
Pandas will be ready to run on your computer.

Panda package is successfully installed.

Sample program

// to be typed in Jupyter notebook

import pandas as pd data = pd.DataFrame({"x1":["y",


"x", "y", "x", "x", "y"],
"x2":range(16, 22),
"x3":range(1, 7),
"x4":["a", "b", "c", "d", "e", "f"],
"x5":range(30, 24, - 1)})
print(data) s1 = pd.Series([1, 3, 4, 5, 6, 2,
9]) s2 = pd.Series([1.1, 3.5, 4.7, 5.8, 2.9,
9.3]) s3 = pd.Series(['a', 'b', 'c', 'd', 'e'])
Data ={'first':s1, 'second':s2, 'third':s3}
dfseries = pd.DataFrame(Data)
print(dfseries)

12
Output:

Result:

Thus the features of Panda packages are downloaded install and explored.

13
1e. Aim:To download, install and explore the features of Statsmodals package.

Problem Description: Statsmodels is a popular library in Python that enables us to estimate


andanalyze various statistical models. It is built on numeric and scientific libraries like NumPy
and SciPy.

Some of the essential features of this package are-


1. It includes various models of linear regression like ordinary least squares, generalized
least squares, weighted least squares, etc.
2. It provides some efficient functions for time series analysis.
3. It also has some datasets for examples and testing.
4. Models based on survival analysis are also available.
5. All the statistical tests that we can imagine for data on a large scale are present.

Installing Statsmodels

Check the version of python installed in the PC.

Using Command Prompt


Type 'Command Prompt' on the taskbar's search pane and you'll see its icon. Click on it to open
the command prompt.
Also, you can directly click on its icon if it is pinned on the taskbar.
1. Once the 'Command Prompt' screen is visible on your screen.
2. Type python -version and click on 'Enter'.
3. The version installed in your system would be displayed in the next line.

14
Installation of statsmodels

Now for installing statsmodels in our system, Open the Command Prompt, type the
following command and click on 'Enter'.
pip install statsmodels
Output

It's time to look have a program in which we will import statsmodels-

Here, we will perform OLS(Ordinary Least Squares) regression, in this technique we will try to
minimize the net sum of squares of difference between the calculated value and observed value.

15
Program

import statsmodels.api as sm import pandas from patsy

import dmatrices df = sm.datasets.get_rdataset("Guerry",

"HistData").data vars = ['Department', 'Lottery', 'Literacy',

'Wealth', 'Region'] df = df[vars] df[-5:]

OUTPUT

Result:

Thus the features of Statsmodals package are downloaded, install and explored.

16
Ex.No.2 WORKING WITH NUMPY ARRAYS

Aim :
Write a python program to show the woking of NumPy Arrays in Python.
2a) Use Numpy array to demonstrate basic array characteristics
b) Create Numpy array using list and tuple
c) Apply basic operations (+,_,*./) and find the transpose of the matrix
d) Perform sorting operation with Numpy arrays
Problem Description

Arrays in NumPy: NumPy’s main object is the homogeneous multidimensional array.


● It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positive integers.
● In NumPy dimensions are called axes. The number of axes is rank.
● NumPy’s array class is called ndarray. It is also known by the alias array.

Example 1:
Write a python program to demonstrate the basic NumPy array
characteristics

import numpy as np

# Creating array object


arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )

# Printing type of arr object


print("Array is of type: ", type(arr))

# Printing array dimensions (axes)


print("No. of dimensions: ", arr.ndim)

# Printing shape of array


print("Shape of array: ", arr.shape)

# Printing size (total number of elements) of array


print("Size of array: ", arr.size)

17
# Printing type of elements in array print("Array
stores elements of type: ", arr.dtype)

Output :
Array is of type: <class 'numpy.ndarray'>

No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int64
2. Array creation: There are various ways to create arrays in NumPy.
● For example, you can create an array from a regular Python list or tuple using the array
function. The type of the resulting array is deduced from the type of the elements in the
sequences.
● Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy
offers several functions to create arrays with initial placeholder content. These minimize
the necessity of growing arrays, an expensive operation. For example: np.zeros, np.ones,
np.full, np.empty, etc.
● To create sequences of numbers, NumPy provides a function analogous to range that returns
arrays instead of lists.
● arange: returns evenly spaced values within a given interval. step size is specified.
● linspace: returns evenly spaced values within a given interval. num no. of elements are
returned.
● Reshaping array: We can use reshape method to reshape an array. Consider an array with
shape (a1, a2, a3, …, aN). We can reshape and convert it into another array with shape (b1,
b2, b3, …, bM). The only required condition is: a1 x a2 x a3 … x aN = b1 x b2 x b3 … x bM
. (i.e original size of array remains unchanged.)
● Flatten array: We can use flatten method to get a copy of array collapsed into one
dimension. It accepts order argument. Default value is ‘C’ (for row-major order). Use ‘F’
for column major order.

18
Example 2:
import numpy as np

# Creating array from list with type float a =


np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')
print ("Array created using passed list:\n", a)

# Creating array from tuple b = np.array((1 , 3,


2)) print ("\nArray created using passed tuple:\
n", b)

# Creating a 3X4 array with all zeros c =


np.zeros((3, 4)) print ("\nAn array initialized
with all zeros:\n", c)

# Create a constant value array of complex type


d = np.full((3, 3), 6, dtype = 'complex') print ("\
nAn array initialized with all 6s.")
print( "Array type is complex:\n", d)

# Create an array with random values


e = np.random.random((2, 2))
print ("\nA random array:\n", e)

# Create a sequence of integers


# from 0 to 30 with steps of 5 f = np.arange(0,
30, 5) print ("\nA sequential array with steps of
5:\n", f)

# Create a sequence of 10 values in range 0 to 5 g =


np.linspace(0, 5, 10) print ("\nA sequential array
with 10 values between"
"0 and 5:\n", g)

# Reshaping 3X4 array to 2X2X3 array


arr = np.array([[1, 2, 3, 4],
[5, 2, 4, 2],
[1, 2, 0, 1]])

newarr = arr.reshape(2, 2, 3)

print ("\nOriginal array:\n", arr)


print ("Reshaped array:\n", newarr)

19
# Flatten array arr = np.array([[1,
2, 3], [4, 5, 6]]) flarr =
arr.flatten()

print ("\nOriginal array:\n", arr)


print ("Fattened array:\n", flarr)

OUTPUT
Array created using passed list:
[[ 1. 2. 4.]
[ 5. 8. 7.]]

Array created using passed tuple:


[1 3 2]

An array initialized with all zeros:


[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]

[ 0. 0. 0. 0.]]
An array initialized with all 6s. Array type is complex:
[[ 6.+0.j 6.+0.j 6.+0.j]
[ 6.+0.j 6.+0.j 6.+0.j]
[ 6.+0.j 6.+0.j 6.+0.j]]

A random array:
[[ 0.46829566 0.67079389]
[ 0.09079849 0.95410464]]

A sequential array with steps of 5:


[ 0 5 10 15 20 25]

20
A sequential array with 10 values between 0 and 5:
[ 0. 0.55555556 1.11111111 1.66666667 2.22222222 2.77777778
3.33333333 3.88888889 4.44444444 5. ]

Original array:
[[1 2 3 4]
[5 2 4 2]
[1 2 0 1]]
Reshaped array:
[[[1 2 3]
[4 5 2]]
[[4 2 1]
[2 0 1]]]

Original array:
[[1 2 3]
[4 5 6]]
Fattened array:
[1 2 3 4 5 6]

21
3. Basic operations:

Operations on single array: We can use overloaded arithmetic operators to do element-wise


operation on array to create a new array. In case of +=, -=, *= operators, the existing array is
modified.

Program 3:
import numpy as np

a = np.array([1, 2, 5, 3])

# add 1 to every element


print ("Adding 1 to every element:", a+1)

# subtract 3 from each element


print ("Subtracting 3 from each element:", a-3)

# multiply each element by 10


print ("Multiplying each element by 10:", a*10)

# square each element


print ("Squaring each element:", a**2)

# modify existing array a *= 2 print ("Doubled


each element of original array:", a)

# transpose of array
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])

print ("\nOriginal array:\n", a)


print ("Transpose of array:\n",
a.T)

22
Output
Adding 1 to every element: [2 3 6 4]
Subtracting 3 from each element: [-2 -1 2 0]
Multiplying each element by 10: [10 20 50 30]
Squaring each element: [ 1 4 25 9]
Doubled each element of original array: [ 2 4 10 6]
Original array:
[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]

23
4. Sorting array: There is a simple np.sort method for sorting NumPy arrays. Let’s explore it a
bit.
Program 4:
import numpy as np

a = np.array([[1, 4, 2],
[3, 4, 6],
[0, -1, 5]])

# sorted array print ("Array elements in


sorted order:\n", np.sort(a,
axis = None))

# sort array row-wise print


("Row-wise sorted array:\n",
np.sort(a, axis = 1))

# specify sort algorithm print ("Column wise sort by


applying merge-sort:\n", np.sort(a, axis = 0,
kind = 'mergesort'))

# Example to show sorting of structured array


# set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]

# Values to be put in array values = [('Hrithik',


2009, 8.5), ('Ajay', 2008, 8.7),
('Pankaj', 2008, 7.9), ('Aakash', 2009, 9.0)]

# Creating array arr =


np.array(values, dtype = dtypes) print
("\nArray sorted by names:\n",
np.sort(arr, order = 'name'))

print ("Array sorted by graduation year and then cgpa:\n",


np.sort(arr, order = ['grad_year', 'cgpa']))

24
OUTPUT
Array elements in sorted order:
[-1 0 1 2 3 4 4 5 6]
Row-wise sorted array:
[[ 1 2 4]
[ 3 4 6]
[-1 0 5]]
Column wise sort by applying merge-sort:
[[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]

Array sorted by names:


[('Aakash', 2009, 9.0) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Pankaj', 2008, 7.9)]
Array sorted by graduation year and then cgpa:
[('Pankaj', 2008, 7.9) ('Ajay', 2008, 8.7) ('Hrithik', 2009, 8.5)
('Aakash', 2009, 9.0)]

Result:

Thus the python program woking of NumPy Arrays in Python are executed successfully.

25
Ex.No.3 WORKING WITH PANDAS DATA FRAMES

Aim:
Write a python program to work with Panda data frames
Pandas
Pandas is an open-source library that is built on top of NumPy library. It is a Python
package that offers various data structures and operations for manipulating numerical data and
time series. It is mainly popular for importing and analyzing data much easier. Pandas is fast and
it has high-performance & productivity for users.
Pandas DataFrame
In the real world, a Pandas DataFrame will be created by loading the datasets from
existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can
be created from the lists, dictionary, and from a list of dictionary etc.
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data
structure with labeled axes (rows and columns). A Data frame is a two-dimensional data
structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame
consists of three principal components, the data, rows, and columns.

Creating a Panda Data Frames


A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
.
Creating an empty dataframe :
A basic DataFrame, which can be created is an Empty Dataframe. An Empty Dataframe is
created just by calling a dataframe constructor.

26
Creating a dataframe using List:

DataFrame can be created using a single list or a list of lists.


Creating dataframe from dict of ndarray/lists:
To create dataframe from dict of narray/list, all the narray must be of same length. If index is
passed then the length index should be equal to the length of arrays. If no index is passed, then
by default, index will be range(n) where n is the array length.
Iterating over rows :

In order to iterate over rows, we can use three function iteritems(), iterrows(), itertuples() . These
three function will help in iteration over rows.

Program

import pandas as pd

# Calling DataFrame constructor


print("Empty dataframe") df =
pd.DataFrame()

print(df)

print("Dataframe creation using List")


# list of strings lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']

# Calling DataFrame constructor on list


df = pd.DataFrame(lst) print(df)
# initialise data of lists.
Data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}

# Create dataframe df =
pd.DataFrame(Data)

27
# Print the output.
print(df)

print("Create dataframe from dictionoary of lists")


# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
'Degree': ["MBA", "BCA", "M.Tech", "MBA"],
'Score':[90, 40, 80, 98]}

# creating a dataframe from a dictionary


df = pd.DataFrame(dict)

print(df)
# iterating over rows using iterrows() function

for i, j in df.iterrows():
print(i, j)
print()

28
OUTPUT
Empty dataframe
Empty DataFrame
Columns: []
Index: []

Dataframe creation using List


0
0 Geeks
1 For
2 Geeks
3 is
4 portal
5 for
6 Geeks
Name Age
0 Tom 20
1 nick 21
2 krish 19
3 jack 18

Create dataframe from dictionoary of lists


name Degree Score
0 aparna MBA 90
1 pankaj BCA 40
2 sudhir M.Tech 80
3 Geeku MBA 98

0 name aparna
Degree MBA
Score 90
Name: 0, dtype: object

1 name pankaj
Degree BCA
Score 40
Name: 1, dtype: object

2 name sudhir
Degree M.Tech
Score 80
Name: 2, dtype: object

3 name Geeku
Degree MBA
Score 98
Name: 3, dtype: object

Result: Thus the python program to work with Panda data frames are executed.

29
READING DATA FROM TEXT FILES, EXCEL AND THE
Ex.No.4 WEB AND EXPLORING VARIOUS COMMANDS FOR DOING
DESCRIPTIVE ANALYTICS ON THE IRIS DATA SET

1a.Aim:
Reading data from text files and exploring various commands for doing descriptive
analytics on the Iris data set.

What is Exploratory Data Analysis?


Exploratory Data Analysis (EDA) is a technique to analyze data using some visual Techniques.
With this technique, we can get detailed information about the statistical summary of the data.
We will also be able to deal with the duplicates values, outliers, and also see some trends or
patterns present in the dataset.
Now let’s see a brief about the Iris dataset.
Iris Dataset
If you are from a data science background you all must be familiar with the Iris Dataset. If you
are not then don’t worry we will discuss this here.
Iris Dataset is considered as the Hello World for data science. It contains five columns namely –
Petal Length, Petal Width, Sepal Length, Sepal Width, and Species Type. Iris is a flowering
plant, the researchers have measured various features of the different iris flowers and recorded
them digitally. https://www.geeksforgeeks.org/exploratory-data-analysis-on-iris-dataset/

Program 1
To read a csv fie
import pandas as pd

# Reading the CSV file


df = pd.read_csv("Iris.csv")

# Printing top 5 rows


df.head()

30
OUTPUT

Checking Duplicates

Let’s see if our dataset contains any duplicates or not. Pandas drop_duplicates() method helps in
removing duplicates from the data frame.
Example:
data = df.drop_duplicates(subset ="Species",)
data
Output

Result:
Thus the reading data from text files and exploring various commands for doing
descriptive analytics on the Iris data set.

31
USE THE DIABETES DATA SET FROM UCI AND PIMA
Ex.No.5
INDIANS DIABETES

5 a)Aim:

To perform Univariate analysis like Frequency, Mean, Median, Mode, Variance,


Standard Deviation, Skewness and Kurtosis in pima Indians dataset .

Program:
import numpy as np import
matplotlib.pyplot as plt import
pandas as pd from scipy.stats
import skew from scipy.stats
import kurtosis import
statistics

df = pd.read_csv('C:/Users/HP/OneDrive/Desktop/RESEARCH/Data/Pima2.csv')
print('THE SHAPE OF THE DATASET IS \n',df.shape) print('THE DATA
TYPES ARE :\n',df.dtypes) print('THE DESCRIPTION OF THE DATASET
IS:\n',df.describe().T) df.plot(kind='density', subplots=True, layout=(3,3),
sharex=False)

print('THE FREQUENCY OF PREGNANCIESIS:\n',df['npreg'].value_counts())


preg = np.array(df['npreg'])
print('MEAN :',statistics.mean(preg))
print('MEDIAN :',statistics.median(preg))
print('MODE :',statistics.multimode(preg))

32
print('VARIANCE :',statistics.variance(preg))
print('sTANDARS DEVIATION :',statistics.stdev(preg))
if (skew(preg)>0) :
print("POSITIVE SKEWNESS \n")
elif (skew(preg)<0) :
print("NEGATIVE SKEWNESS \n")
else :
print("NO SKEWNESS \n")
print('KURTOSIS',kurtosis(preg))
print('THE FREQUENCY OF GLOCOSE IS:\
n',df['glu'].value_counts()) G = np.array(df['glu']) print('MEAN
:',statistics.mean(G))
print('MEDIAN :',statistics.median(G))
print('MODE :',statistics.multimode(G))
print('VARIANCE :',statistics.variance(G))
print('sTANDARS DEVIATION :',statistics.stdev(G)) if
(skew(G)>0) : print("POSITIVE SKEWNESS \n")
elif (skew(G)<0) :
print("NEGATIVE SKEWNESS \n")
else :
print("NO SKEWNESS \n")
print('KURTOSIS',kurtosis(G))

Output of above coding:


THE SHAPE OF THE DATASET IS
(300, 9)
THE DATA TYPES ARE :

33
Unnamed: 0 int64
npreg int64 glu
int64 bp
float64 skin
float64 bmi
float64 ped
float64 age
int64 type
object dtype: object

THE DESCRIPTION OF THE DATASET IS:


count mean std ... 50% 75% max
Unnamed: 0 300.0 150.500000 86.746758 ... 150.500 225.25000 300.000
npreg 300.0 3.786667 3.306195 ... 3.000 6.00000 14.000 glu
300.0 123.743333 30.011549 ... 121.000 142.00000 199.000 bp
287.0 72.320557 11.704477 ... 72.000 80.00000 114.000 skin 202.0
29.153465 11.682415 ... 29.000 36.00000 99.000 bmi 297.0
32.052862 6.492933 ... 32.000 36.50000 52.900 ped 300.0
0.435657 0.294600 ... 0.336 0.58675 2.288 age 300.0 33.096667
11.578176 ... 29.000 40.00000 72.000

[8 rows x 8 columns]

34
THE FREQUENCY OF PREGNANCIES IS:
1 51
0 44
2 41
4 35
3 26
5 22
6 19
7 17
8 15
9 9
10 8
12 6
13 3
14 2
11 2
Name: npreg, dtype: int64
MEAN :3
MEDIAN : 3.0

35
MODE : [1]
VARIANCE : 10
sTANDARS DEVIATION : 3.1622776601683795
POSITIVE SKEWNESS

KURTOSIS 0.20897562301271444
THE FREQUENCY OF GLOCOSE IS:
100 8
125 8
111 6
95 6
139 6
..
152 1
149 1
135 1
198 1
56 1
Name: glu, Length: 108, dtype: int64
MEAN : 123
MEDIAN : 121.0
MODE : [100, 125]
VARIANCE : 900
sTANDARS DEVIATION : 30.0
POSITIVE SKEWNESS

KURTOSIS -0.19604758469298522

36
Result:

Thus the perform Univariate analysis like Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis in pima Indians dataset is executed .
5 b) Aim:

To perform Bivariate analysis with Linear regression modeling using Pima Indians
dataset .
Program:
# Linear Regression import numpy as np import
matplotlib.pyplot as plt import pandas as pd from
sklearn.model_selection import train_test_split from
sklearn.linear_model import LinearRegression
# import the library

diabetes_dataset =
pd.read_csv('C:/Users/HP/OneDrive/Desktop/RESEARCH/Data/diabetes.csv')
print('THE SHAPE OF THE DATASET IS \n',diabetes_dataset.shape) print('THE
DATA TYPES ARE :\n',diabetes_dataset.dtypes) print('THE DESCRIPTION OF
THE DATASET IS:\n',diabetes_dataset.describe().T) max=0 ind=0 col=0 c=[0 for x
in range(diabetes_dataset.columns.size)] for i in range (0,8):
for j in range (0,8):
if (i==j):

37
continue
c[i]=diabetes_dataset.iloc[:,i].corr(diabetes_dataset.iloc[:,j]) if
(max<c[i]): max=c[i] ind=i col=j print('maxindex=,col ',ind,col)

X = diabetes_dataset.iloc[:,ind].values.reshape(-1, 1) #independent variable array


Y = diabetes_dataset.iloc[:,col].values.reshape(-1, 1) #dependent variable vector
#splitting
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2,random_state=0)
#fitting the model regressor =
LinearRegression() regressor.fit(X_train,
y_train) print('Regression
intercept',regressor.intercept_)
print('Regression coefficient',regressor.coef_)
def calc(slope, intercept, preg):

return slope*preg+intercept

age_Pred_regplot = calc(regressor.coef_, regressor.intercept_, 10) print('predicted


age plot', age_Pred_regplot ) age_Pred = regressor.predict([[10]]) print('predicted
age model pred', age_Pred) # 94.80663482 y_pred = regressor.predict(X_test)
df_preds = pd.DataFrame({'Actual': y_test.squeeze(), 'Predicted': y_pred.squeeze()})
print(df_preds)

from sklearn.metrics import mean_absolute_error, mean_squared_error


mae = mean_absolute_error(y_test, y_pred) mse =
mean_squared_error(y_test, y_pred) rmse = np.sqrt(mse) print(f'Mean
absolute error: {mae:.2f}') print(f'Mean squared error: {mse:.2f}')
print(f'Root mean squared error: {rmse:.2f}')

38
plt.scatter(X_test, y_test, color = "red") plt.plot(X_train,
regressor.predict(X_train), color = "green") plt.title("Preg
vs Age (Testing set)") plt.xlabel("Preg")

plt.ylabel("Age")
plt.show()

# logistic regression import numpy as np import


matplotlib.pyplot as plt import pandas as pd from
sklearn.model_selection import train_test_split
# import the library

diabetes_dataset =
pd.read_csv('C:/Users/HP/OneDrive/Desktop/RESEARCH/Data/diabetes.csv')
print('THE SHAPE OF THE DATASET IS \n',diabetes_dataset.shape) print('THE
DATA TYPES ARE :\n',diabetes_dataset.dtypes) print('THE DESCRIPTION OF
THE DATASET IS:\n',diabetes_dataset.describe().T) max=0 ind=0 c=[0 for x in
range(diabetes_dataset.columns.size)] for i in range (0,8):
c[i]=diabetes_dataset.iloc[:,i].corr(diabetes_dataset.iloc[:,8]) if (max<c[i]): max=c[i]
ind=i
X = diabetes_dataset.iloc[:,ind].values.reshape(-1, 1) #independent variable array
Y = diabetes_dataset.iloc[:,8].values #dependent variable vector
#splitting
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2,random_state=0)
#fitting the model from sklearn.linear_model import
LogisticRegression model = LogisticRegression()
model.fit(X_train,y_train) train_acc =
model.score(X_train,y_train) print("The Accuracy

39
for Training Set is {}".format(train_acc*100))
y_pred = model.predict(X_test)

print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(model.score(X_test,


y_test))) from sklearn.metrics import confusion_matrix confusion_matrix =
confusion_matrix(y_test, y_pred) print(confusion_matrix) out = model.predict([[150]])
print('The outcome for glucose level is :', out)

Output of above coding:


THE SHAPE OF THE DATASET IS
(768, 9)
THE DATA TYPES ARE :
Pregnancies int64
Glucose int64
BloodPressure int64
SkinThickness int64
Insulin int64
BMI float64
DiabetesPedigreeFunction float64
Age int64
Outcome int64
dtype: object
THE DESCRIPTION OF THE DATASET IS:
count mean ... 75% max
Pregnancies 768.0 3.845052 ... 6.00000 17.00
Glucose 768.0 120.894531 ... 140.25000 199.00
BloodPressure 768.0 69.105469 ... 80.00000 122.00

40
SkinThickness 768.0 20.536458 ... 32.00000 99.00
Insulin 768.0 79.799479 ... 127.25000 846.00
BMI 768.0 31.992578 ... 36.60000 67.10
DiabetesPedigreeFunction 768.0 0.471876 ... 0.62625 2.42
Age 768.0 33.240885 ... 41.00000 81.00
Outcome 768.0 0.348958 ... 1.00000 1.00

[9 rows x 8 columns]
maxindex=,col 0 7
Regression intercept [26.29935069]
Regression coefficient [[1.88280735]]
predicted age plot [[45.12742415]]
predicted age model pred [[45.12742415]]
Actual Predicted
0 22 28.182158
1 23 30.064965
2 25 33.830580
3 51 35.713387
4 31 26.299351
.. ... ...
149 29 30.064965
150 28 33.830580
151 22 33.830580
152 24 31.947773
153 24 28.182158
[154 rows x 2 columns]
Mean absolute error: 6.77
Mean squared error: 77.77

41
Root mean squared error: 8.82

THE SHAPE OF THE DATASET IS


(768, 9)
THE DATA TYPES ARE :
Pregnancies int64
Glucose int64
BloodPressure int64
SkinThickness int64
Insulin int64
BMI float64
DiabetesPedigreeFunction float64
Age int64
Outcome int64
dtype: object
THE DESCRIPTION OF THE DATASET IS: count mean ... 75% max
Pregnancies 768.0 3.845052 ... 6.00000 17.00
Glucose 768.0 120.894531 ... 140.25000 199.00
BloodPressure 768.0 69.105469 ... 80.00000 122.00
SkinThickness 768.0 20.536458 ... 32.00000 99.00

42
Insulin 768.0 79.799479 ... 127.25000 846.00
BMI 768.0 31.992578 ... 36.60000 67.10
DiabetesPedigreeFunction 768.0 0.471876 ... 0.62625 2.42
Age 768.0 33.240885 ... 41.00000 81.00
Outcome 768.0 0.348958 ... 1.00000 1.00

[9 rows x 8 columns]
The Accuracy for Training Set is 73.61563517915309
Accuracy of logistic regression classifier on test set: 0.79
[[96 11]
[22 25]]
The outcome for glucose level is : [1]

Result:
Thus the perform Bivariate analysis with Linear regression modeling using Pima Indians
dataset are implemented .

43
5 c)Aim:

To perform multiple regression analysis using multivariate logistic regression on UCI


diabetes dataset .
Program:
# multivariate logistic regression import numpy as
np import matplotlib.pyplot as plt import pandas as
pd from sklearn.model_selection import
train_test_split # import the library

diabetes_dataset =
pd.read_csv('C:/Users/HP/OneDrive/Desktop/RESEARCH/Data/diabetes.csv')
print('THE SHAPE OF THE DATASET IS \n',diabetes_dataset.shape) print('THE
DATA TYPES ARE :\n',diabetes_dataset.dtypes) print('THE DESCRIPTION OF
THE DATASET IS:\n',diabetes_dataset.describe().T) max=0
col=0
c1 = [[0] * 4 for i in range(8)] c=[0 for x in
range(diabetes_dataset.columns.size)] for i in range (0,8):
c[i]=diabetes_dataset.iloc[:,i].corr(diabetes_dataset.iloc[:,8]) if
(c[i]>0.25): c1[col]=i col=col+1
df=diabetes_dataset.iloc[:,c1[0:col]]
X = df.iloc[:,:col].values #independent variable array
Y = diabetes_dataset.iloc[:,8].values #dependent variable vector
#splitting
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2,random_state=0)
#fitting the model from sklearn.linear_model import
LogisticRegression model = LogisticRegression()
model.fit(X_train,y_train) train_acc =
model.score(X_train,y_train) print("The Accuracy for Training Set
is {}".format(train_acc*100)) y_pred = model.predict(X_test)

44
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(model.score(X_test,
y_test)))
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix) out =
model.predict([[150,34]]) print('The outcome of
glucose and bmi :', out)
Output of above coding:
THE SHAPE OF THE DATASET IS
(768, 9)
THE DATA TYPES ARE :
Pregnancies int64
Glucose int64
BloodPressure int64
SkinThickness int64
Insulin int64
BMI float64
DiabetesPedigreeFunction float64
Age int64
Outcome int64
dtype: object
THE DESCRIPTION OF THE DATASET IS:
count mean ... 75% max
Pregnancies 768.0 3.845052 ... 6.00000 17.00
Glucose 768.0 120.894531 ... 140.25000 199.00
BloodPressure 768.0 69.105469 ... 80.00000 122.00
SkinThickness 768.0 20.536458 ... 32.00000 99.00
Insulin 768.0 79.799479 ... 127.25000 846.00
BMI 768.0 31.992578 ... 36.60000 67.10

45
DiabetesPedigreeFunction 768.0 0.471876 ... 0.62625 2.42
Age 768.0 33.240885 ... 41.00000 81.00
Outcome 768.0 0.348958 ... 1.00000 1.00

[9 rows x 8 columns]
The Accuracy for Training Set is 76.2214983713355
Accuracy of logistic regression classifier on test set: 0.79
[[95 12]
[21 26]]
The outcome of glucose and bmi : [1]

Result:
Thus the perform multiple regression analysis using multivariate logistic
regression on UCI diabetes dataset are executed .

46
APPLY AND EXPLORE VARIOUS PLOTTING
FUNCTIONSON UCI DATA SETS.
A. NORMAL CURVES
Ex.No.6 B. DENSITY AND CONTOUR PLOTS
C. CORRELATION AND SCATTER PLOTS
D. HISTOGRAMS
E. THREE DIMENSIONAL PLOTTING

Aim:
To apply and explore various plotting functions on UCI data sets.
A. Normal curves
B. Density and contour plots
C. Correlation and scatter plots
D. Histograms
E. Three-dimensional plottingA.Normal curves
importnumpy as np
importmatplotlib.pyplot as plt
fromscipy.stats importnorm
importstatistics

# Plot between -10 and 10 with .001 steps.


x_axis =np.arange(-20, 20, 0.01)

# Calculating mean and standard deviation


mean =statistics.mean(x_axis) sd
=statistics.stdev(x_axis)

plt.plot(x_axis, norm.pdf(x_axis, mean, sd))


plt.show()

47
Output

48
B)Density and contour plots:

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np

Visualizing a Three-Dimensional Function

def f(x, y):

return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)


x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, colors='black');
Output

Notice that by default when a single color is used, negative values are represented by dashed
lines, and positive values by solid lines. Alternatively, the lines can be color-coded by specifying
a colormap with the cmap argument. Here, we'll also specify that we want more lines to be drawn
—20 equally spaced intervals within the data range:
plt.contour(X, Y, Z, 20, cmap='RdGy');

49
plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower',
cmap='RdGy')
plt.colorbar()
plt.axis(aspect='image');

contours = plt.contour(X, Y, Z, 3, colors='black')


plt.clabel(contours, inline=True, fontsize=8)

plt.imshow(Z, extent=[0, 5, 0, 5], origin='lower',


cmap='RdGy', alpha=0.5)
plt.colorbar();

50
51
C.Correlation and Scatterplots

1. Preliminaries

import pandas as pd

con = pd.read_csv('Data/ConcreteStrength.csv')
con

2. Renaming columns
list(con.columns)

['No',
'Cement',
'Slag',
'Fly ash',
'Water',
'SP',
'Coarse Aggr.',
'Fine Aggr.',
'Air Entrainment',
'Compressive Strength (28-day)(Mpa)']

52
con.rename(columns={'Fly ash': 'FlyAsh', 'Coarse Aggr.': "CoarseAgg",
'Fine Aggr.': 'FineAgg', 'Air Entrainment': 'AirEntrain',
'Compressive Strength (28-day)(Mpa)': 'Strength'}, inplace=True)
con.head()

As before, we should convert any obvious categorical variables to categories:

con['AirEntrain'] = con['AirEntrain'].astype('category')

53
con.describe(include='category')

3. Scatterplots
Scatterplots are a fundamental graph type—much less complicated than histograms and
boxplots. As such, we might use the Mathplotlib library instead of the Seaborn library. But
since we have already used Seaborn, I will stick with it here. Just know that there are many
ways to create scatterplots and other basic graphs in Python.
To create a bare-bones scatterplot, we must do four things:

1. Load the seaborn library


2. Specify the source data frame
3. Set the x axis, which is generally the name of a predictor/independent variable
importseabornassns

sns.scatterplot(x="FlyAsh", y="Strength", data=con);

4. Set the y axis, which is generally the name of a response/dependent variable

4. Adding labels

54
ax = sns.scatterplot(x="FlyAsh", y="Strength", data=con)
ax.set_title("Concrete Strength vs. Fly ash")
ax.set_xlabel("Fly ash");

5.Adding a best fit line


sns.lmplot(x="FlyAsh", y="Strength", data=con);

6.Adding color as a third dimension

55
A graphics “party trick” made fashionable by tools like Tableau is to use color, size, or
some other visual cue to add a third dimension to a two-dimensional scatterplot. In the case of
color (or “hue” in Seaborn terminology), this third dimension need to be a non-continuous
variable. This is because the palette of colors available has a finite number of options.

sns.lmplot(x="FlyAsh", y="Strength", hue="AirEntrain", data=con);

56
D)Histogram
importmatplotlib.pyplot as plt importnumpy
as np frommatplotlib importcolors
frommatplotlib.ticker importPercentFormatter

# Creating dataset
np.random.seed(23685752)
N_points =10000 n_bins
=20

# Creating distribution x
=np.random.randn(N_points) y =.8**x
+np.random.randn(10000) +25 legend
=['distribution'] # Creating histogram
fig, axs =plt.subplots(1, 1,
figsize=(10, 7),
tight_layout =True) # Remove axes
splines
fors in['top', 'bottom', 'left', 'right']:
axs.spines[s].set_visible(False)

# Remove x, y ticks
axs.xaxis.set_ticks_position('none')
axs.yaxis.set_ticks_position('none')

# Add padding between axes and labels


axs.xaxis.set_tick_params(pad =5)
axs.yaxis.set_tick_params(pad =10)

# Add x, y gridlines
axs.grid(b =True, color ='grey',
linestyle ='-.', linewidth =0.5,
alpha =0.6)

# Add Text watermark fig.text(0.9,


0.15, 'Jeeteshgavande30',
fontsize =12,
color ='red',
ha ='right',
va ='bottom',
alpha =0.7)

57
# Creating histogram
N, bins, patches =axs.hist(x, bins =n_bins)

# Setting color fracs


=((N**(1/5)) /N.max())
norm =colors.Normalize(fracs.min(), fracs.max())

forthisfrac, thispatch inzip(fracs, patches):


color =plt.cm.viridis(norm(thisfrac))
thispatch.set_facecolor(color)

# Adding extra features


plt.xlabel("X-axis")
plt.ylabel("y-axis")
plt.legend(legend)
plt.title('Customized histogram')
# Show plot plt.show()

Output:

58
E)Three dimensional plotting
Three-dimensional plots are enabled by importing the mplot3d toolkit, included with the main
Matplotlib installation:
from mpl_toolkits import mplot3d
Once this submodule is imported, a three-dimensional axes can be created by passing the
keyword projection='3d' to any of the normal axes creation routines:

%matplotlib inline
importnumpyasnp
importmatplotlib.pyplotasplt
fig = plt.figure() ax =
plt.axes(projection='3d')

Three-dimensional Points and Lines

The most basic three-dimensional plot is a line or collection of scatter plot created from
sets of (x, y, z) triples. In analogy with the more common two-dimensional plots discussed
earlier, these can be created using the ax.plot3D and ax.scatter3D functions. The call signature
for these is nearly identical to that of their two-dimensional counterparts, so you can refer to
Simple Line Plots and Simple Scatter Plots for more information on controlling the output.
Here we'll plot a trigonometric spiral, along with some points drawn randomly near the line:

In [4]: ax = plt.axes(projection='3d')

# Data for a three-dimensional


line zline = np.linspace(0, 15,

59
1000) xline = np.sin(zline) yline =
np.cos(zline)
ax.plot3D(xline, yline, zline, 'gray')

# Data for three-dimensional scattered points zdata = 15 *


np.random.random(100) xdata = np.sin(zdata) + 0.1 *
np.random.randn(100) ydata = np.cos(zdata) + 0.1 *
np.random.randn(100) ax.scatter3D(xdata, ydata, zdata,
c=zdata, cmap='Greens');

Three-dimensional Contour Plots


deff(x, y): returnnp.sin(np.sqrt(x **
2 + y ** 2))

x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');

60
Output:

In [7]:
ax.view_init(60, 35)

fig

61
OUTPUT

Wireframes and Surface Plots


Two other types of three-dimensional plots that work on gridded data are wireframes and surface
plots. These take a grid of values and project it onto the specified three-dimensional surface, and
can make the resulting three-dimensional forms quite easy to visualize. Here's an example of
using a wireframe:
In [7]:
fig = plt.figure() ax =

plt.axes(projection='3d')

ax.plot_wireframe(X, Y, Z,

color='black')

ax.set_title('wireframe');

62
OUTPUT

A surface plot is like a wireframe plot, but each face of the wireframe is a filled polygon. Adding
a colormap to the filled polygons can aid perception of the topology of the surface being
visualized:
In [8]:
ax = plt.axes(projection='3d')

ax.plot_surface(X, Y, Z, rstride=1,

cstride=1, cmap='viridis',

edgecolor='none') ax.set_title('surface');

63
OUTPUT

Surface Triangulations
For some applications, the evenly sampled grids required by the above routines is overly
restrictive and inconvenient. In these situations, the triangulation-based plots can be very useful.
What if rather than an even draw from a Cartesian or a polar grid, we instead have a set of
random draws?
In [9]:
theta = 2 * np.pi *

np.random.random(1000) r = 6 *

np.random.random(1000) x = np.ravel(r

* np.sin(theta)) y = np.ravel(r *

np.cos(theta)) z = f(x, y)

We could create a scatter plot of the points to get an idea of the surface we're sampling from:
In [10]:
64
ax = plt.axes(projection='3d') ax.scatter(x, y, z,

c=z, cmap='viridis', linewidth=0.5);

OUTPUT

This leaves a lot to be desired. The function that will help us in this case is ax.plot_trisurf,
which creates a surface by first finding a set of triangles formed between adjacent points
(remember that x, y, and z here are one-dimensional arrays):
In [11]:
ax = plt.axes(projection='3d')

ax.plot_trisurf(x, y, z,

cmap='viridis', edgecolor='none');

65
OUTPUT

Result:

66
Thus the three dimensional plotting on UCI Datasets are executed.

Ex.No.7 VISUALIZING GEOGRAPHIC DATA WITH BASEMAP

Aim:

To visualizing geographic data with basemap


Basemap
Basemap is a great tool for creating maps using python in a simple way. It’s a matplotlib
extension, so it has got all its features to create data visualizations, and adds the geographical
projections and some datasets to be able to plot coast lines, countries, and so on directly
from the library.

• Physical boundaries and bodies of water o

drawcoastlines(): Draw continental coast lines o

drawlsmask(): Draw a mask between the land and sea, for

use with projecting images on one or the other o

drawmapboundary(): Draw the map boundary, including

the fill color for oceans. o drawrivers(): Draw rivers on the

map o fillcontinents(): Fill the continents with a given

color; optionally fill lakes with another color

• Political boundaries o drawcountries(): Draw

country boundaries o drawstates(): Draw US state

boundaries o drawcounties(): Draw US county boundaries

• Map features o drawgreatcircle(): Draw a great

circle between two points o drawparallels(): Draw lines of

constant latitude o drawmeridians(): Draw lines of constant

longitude

67
o drawmapscale(): Draw a linear scale on the map

• Whole-globe images o bluemarble(): Project NASA's blue

marble image onto the map o shadedrelief(): Project a shaded

relief image onto the map o etopo(): Draw an etopo relief image

onto the map o warpimage(): Project a user-provided image onto

the map

Installation

Step 1: Use the Anaconda Navigator to install basemap. Go to start and click Anaconda
command prompt.

Step 2: Before installing Basemap, be sure to install pillow package. Install the pillow package
using the command line pip install pillow

Step 3: Next step is to install the Basemap using the following command
pip install basemap
The anaconda command prompt will look like

Step 4: After successfully installing basmape package navigate to jupyter notebook using the
following command

68
jupyter notebook

Step 4: The above cmd will open a new webpage with address http://localhost:8888/tree.

Step 5: Click NewPython 3 (ipykernal). CS3362-Data Science Lab Manual


Arunachala College of Engineering for Women
Page 89

Step 6: Start the program for visualizing geographical data using basemap. Click run to run the
pogram.
Some of these map-specific methods are:

• contour()/contourf() : Draw contour lines or filled contours

• imshow(): Draw an image

• pcolor()/pcolormesh() : Draw a pseudocolor plot for irregular/regular meshes

• plot(): Draw lines and/or markers.

• scatter(): Draw points with markers.

• quiver(): Draw vectors.

• barbs(): Draw wind barbs.

• drawgreatcircle(): Draw a great circle.

69
PROGRAM
1. Simple Maps and color it
To start, import Basemap as well as matplotlib and numpy:

from mpl_toolkits.basemap import

Basemap import matplotlib.pyplot as plt

import numpy as np

%matplotlib inline import warnings import matplotlib.cbook

warnings.filterwarnings("ignore",category=matplotlib.cbook.mplDeprecation)

Basemap?

fig = plt.figure(num=None, figsize=(12, 8) ) m =


Basemap(projection='merc',llcrnrlat=-
80,urcrnrlat=80,llcrnrlon=180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
plt.title("Mercator Projection")
plt.show()

70
OUTPUT:

71
1a. Coding for Coloring fig = plt.figure(num=None, figsize=(12,

8) ) m = Basemap(projection='merc',llcrnrlat=-

80,urcrnrlat=80,llcrnrlon=-

180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,91.,30.),labels=[True,True,False,False],dashes=[2,2])
m.drawmeridians(np.arange(-180.,181.,60.),labels=[False,False,False,True],dashes=[2,2])
m.drawmapboundary(fill_color='lightblue')
plt.title("Mercator Projection")
Output

72
1b. Same sequence of commands but with a different projection:

fig = plt.figure(num=None, figsize=(12, 8) ) m =

Basemap(projection='moll',lon_0=0,resolution='c')

m.drawcoastlines()
m.fillcontinents(color='purple',lake_color='lightblue')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,91.,30.),labels=[True,True,False,False],dashes=[2,2])
m.drawmeridians(np.arange(-180.,181.,60.),labels=[False,False,False,False],dashes=[2,2])
m.drawmapboundary(fill_color='lightblue')
plt.title("Mollweide Projection");
Output

73
2 a. Create a map centered on North America with lines showing the country and state
boundaries as well as rivers:

fig = plt.figure(num=None, figsize=(12, 8) )

m=
Basemap(width=6000000,height=4500000,resolution='c',projection='aea',lat_1=35.,lat_2=45,lon
_0=-100,lat_0=40)
m.drawcoastlines(linewidth=0.5)
m.fillcontinents(color='tan',lake_color='lightblue')
# draw parallels and meridians.
m.drawparallels(np.arange(-90.,91.,15.),labels=[True,True,False,False],dashes=[2,2])
m.drawmeridians(np.arange(-180.,181.,15.),labels=[False,False,False,True],dashes=[2,2])
m.drawmapboundary(fill_color='lightblue')

m.drawcountries(linewidth=2, linestyle='solid', color='k' )


m.drawstates(linewidth=0.5, linestyle='solid', color='k')
m.drawrivers(linewidth=0.5, linestyle='solid', color='blue')

74
75
2b. Use a different map projection, zoom-in to North America and plot the location
of Seattle fig = plt.figure(figsize=(8, 8)) m = Basemap(projection='lcc', resolution=None,
width=8E6, height=8E6, lat_0=45, lon_0=-100,)
m.etopo(scale=0.5, alpha=0.5) # Map
(long, lat) to (x, y) for plotting x, y =
m(-122.3, 47.6)

plt.plot(x, y, 'ok', markersize=5)


plt.text(x, y, ' Seattle', fontsize=12);

Output

76
2. Map Projections
The Basemap package implements several dozen such projections, all referenced by a
short format code. Here we'll briefly demonstrate some of the more common ones.
We'll start by defining a convenience routine to draw our world map along with the
longitude and latitude lines: from itertools import chain def draw_map(m, scale=0.2):
# draw a shaded-relief image

m.shadedrelief(scale=scale)
# lats and longs are returned as a dictionary lats =
m.drawparallels(np.linspace(-90, 90, 13)) lons =
m.drawmeridians(np.linspace(-180, 180, 13))
# keys contain the plt.Line2D instances lat_lines =
chain(*(tup[1][0] for tup in lats.items())) lon_lines =
chain(*(tup[1][0] for tup in lons.items())) all_lines =
chain(lat_lines, lon_lines)
# cycle through these lines and set the desired style
for line in all_lines:
line.set(linestyle='-', alpha=0.3, color='w')
Cylindrical projections
The simplest of map projections are cylindrical projections, in which lines of constant
latitude and longitude are mapped to horizontal and vertical lines, respectively. This type
of mapping represents equatorial regions quite well, but results in extreme distortions
near the poles. The spacing of latitude lines varies between different cylindrical
projections, leading to different conservation properties, and different distortion near the
poles. In the following figure we show an example of the equidistant cylindrical
projection, which chooses a latitude scaling that preserves distances along meridians.
Other cylindrical projections are the Mercator (projection='merc') and the cylindrical
equal area (projection='cea') projections.

77
fig = plt.figure(figsize=(8, 6), edgecolor='w') m
= Basemap(projection='cyl', resolution=None,
llcrnrlat=-90, urcrnrlat=90, llcrnrlon=-180,
urcrnrlon=180, )
draw_map(m)
OUTPUT

orthographic projection or Perspective projections


Perspective projections are constructed using a particular choice of perspective point, similar
to if you photographed the Earth from a particular point in space (a point which, for some
projections, technically lies within the Earth!). One common example is the orthographic
projection (projection='ortho'), which shows one side of the globe as seen from a viewer at a
very long distance. As such, it can show only half the globe at a time. Other perspective-based
projections include the gnomonic projection (projection='gnom') and stereographic projection
(projection='stere'). These are often the most useful for showing small portions of the map.

fig = plt.figure(figsize=(8, 8)) m =


Basemap(projection='ortho', resolution=None,
lat_0=50, lon_0=0) draw_map(m);

78
OUTPUT

Conic projections
A Conic projection projects the map onto a single cone, which is then unrolled. This can lead to
very good local properties, but regions far from the focus point of the cone may become very
distorted. One example of this is the Lambert Conformal Conic projection (projection='lcc'),
which we saw earlier in the map of North America. It projects the map onto a cone arranged in
such a way that two standard parallels (specified in Basemap by lat_1 and lat_2) have well
represented distances, with scale decreasing between them and increasing outside of them. Other
useful conic projections are the equidistant conic projection (projection='eqdc') and the Albers
equal-area projection (projection='aea'). Conic projections, like perspective projections, tend to
be good choices for representing small to medium patches of the globe.

79
fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,
lon_0=0, lat_0=50, lat_1=45, lat_2=55,
width=1.6E7, height=1.2E7) draw_map(m)

OUTPUT

Result:
Thus visualizing geographic data with basemap is implemented.

80

You might also like