0% found this document useful (0 votes)

55 views31 pages

Foundation of Data Science Lab Manual

Foundation of data science lab manual

Uploaded by

Hemavathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views31 pages

Foundation of Data Science Lab Manual

Foundation of data science lab manual

Uploaded by

Hemavathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Ex.

No: 1 INSTALLATION OF PYTHON PACKAGES

AIM
To download and install the NumPy, SciPy, Jupyter, Statsmodels and Pandas packages.

INTRODUCTION
Python is a high-level and general-purpose programming language. One of the
important features that makes python a strong programming language is its vast collection of
python packages which includes data science and machine learning packages. A lot of external
packages are written in python which you can be installed and used depending upon the
requirement.
Some important packages are
1.NumPy
2.SciPy
3.Jupyter
4. Statsmodels
5. Pandas
1. NUMPY
NumPy is the fundamental package for array computing with Python. It can also be
used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be
defined. This allows it to seamlessly and speedily integrate with a wide variety of databases.
It provides,
 a powerful N-dimensional array object
 sophisticated (broadcasting) functions
 tools for integrating C/C++ and Fortran code
 useful linear algebra, Fourier transform, and random number capabilities
Installing Numpy on Windows:
a. For Conda Users:
conda install numpy
b. For PIP Users:
pip install numpy
2. SCIPY
SciPy (pronounced “Sigh Pie”) is open-source software for mathematics, science, and
engineering. The SciPy library depends on NumPy, which provides convenient and fast N-
dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and
provides many user-friendly and efficient numerical routines such as routines for numerical
integration and optimization.
It is designed on the top of Numpy library that gives more extension of finding scientific
mathematical formulae like Matrix Rank, Inverse, polynomial equations, LU Decomposition,
etc. Using its high-level functions will significantly reduce the complexity of the code and
helps in better analysing the data.
Installing Scipy on Windows:
a. For Conda Users:
conda install scipy
b. For PIP Users:
pip install scipy
Verifying Scipy Module Installation through python shell
import scipy
scipy. __version__

3. JUPYTER
Jupyter Notebook is an open-source web application that allows you to create and share
documents that contain live code, equations, visualizations, and narrative text. Uses include
data cleaning and transformation, numerical simulation, statistical modeling, data
visualization, machine learning, and much more.
Advantages of a Jupyter Notebook
 Notebook has the ability to re-run individual code snippets, and it provides you the
flexibility of modifying them before re-running.
 You can deploy a Jupyter Notebook on a remote server and access it from your local
web browser.
 You can insert notes and documentation to your code in a Jupyter Notebook in various
formats like markdown, latex, and HTML
Installing Jupyter Notebook using pip:
pip install jupyter
4. STATSMODELS
Statsmodels is a popular library in Python that enables us to estimate and analyze
various statistical models. It is built on numeric and scientific libraries like NumPy and SciPy.
Features
 It includes various models of linear regression like ordinary least squares, generalized
least squares, weighted least squares, etc.
 It provides some efficient functions for time series analysis.
 It also has some datasets for examples and testing.
 Models based on survival analysis are also available.
 All the statistical tests that we can imagine for data on a large scale are present.
Installing Statsmodels using Anaconda:
open the Anaconda Prompt and type the following command-
conda install -c conda-forge statsmodels
Installing Statsmodels using pip:
To obtain the latest released version of statsmodels using pip:
python -m pip install statsmodels

5. PANDAS
Pandas in Python is a package that is written for data analysis and manipulation. Pandas
offer various operations and data structures to perform numerical data manipulations and time
series. Pandas is an open-source library that is built over Numpy libraries. Pandas library is
known for its high productivity and high performance. Pandas is popular because it makes
importing and data analysing much easier.
Main Features
 Easy handling of missing data (represented as NaN, NA or NaT ) in floating point as
well as non-floating point data
 Intuitive merging and joining data sets
 Flexible reshaping and pivoting of data sets
 Size mutability: columns can be inserted and deleted from DataFrame and higher
dimensional objects
 Time series-specific functionality: date range generation and frequency conversion,
moving window statistics, date shifting and lagging
Install Pandas using pip
pip install pandas

RESULT
Thus the NumPy, SciPy, Jupyter, Statsmodels and Pandas packages are downloaded
and installed.

Ex. No: 2 NUMPY ARRAYS

AIM
To work with different features provided by Numpy arrays.
Arrays
1. Creating Arrays
import numpy as np
a = np.array(42) #0-D
b = np.array([1, 2, 3, 4, 5]) #1-D
c = np.array([[1, 2, 3], [4, 5, 6]]) # 2-D
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]]) #3-D
print(a.ndim); print(b.ndim); print(c.ndim); print(d.ndim);
2. Access Array Elements
Access 2-D Arrays
To access elements from 2-D arrays we can use comma separated integers
representing the dimension and the index of the element.
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('2nd element on 1st row: ', arr[0, 1])
Access 3-D Arrays
To access elements from 3-D arrays we can use comma separated integers
representing the dimensions and the index of the element.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])
3. Array Slicing
Slicing in python means taking elements from one given index to another given index
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5:2])
4. Data Types
NumPy has some extra data types, and refer to data types with one character. Below
is a list of all data types in NumPy and the characters used to represent them.
i - integer M - datetime
b - boolean O - object
u - unsigned integer S - string
f - float U - unicode string
c - complex float V - fixed chunk of memory for other
m - timedelta type (void)

import numpy as np
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)

5. Copy & View

Make a copy
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42
print(arr); print(x)
Make a view
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
arr[0] = 42
print(arr); print(x)

6. Array Shape & Reshaping

Array Shape
NumPy arrays have an attribute called shape that returns a tuple with each index
having the number of corresponding elements.

import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)

Array Reshaping
 Reshaping means changing the shape of an array.
 The shape of an array is the number of elements in each dimension.
 By reshaping we can add or remove dimensions or change number of elements in
each dimension.
Convert the following 1-D array with 12 elements into a 3-D array.
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2);print(newarr)
7. Array Iterating
 Iterating means going through elements one by one.
 As we deal with multi-dimensional arrays in numpy, we can do this using
basicfor loop of python.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
for x in arr:
print(x)

8. Joining Array
Joining means putting contents of two or more arrays in a single array.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

9. Splitting Array
Splitting breaks one array into multiple.
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(newarr)

10. Searching Arrays

where() - returns the indices of the search element
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x = np.where(arr == 4)
print(x)

11. Sorting Arrays

import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))

RESULT
Thus the programs using NumPy has been successfully executed and verified.
EX.NO: 3 PANDAS DATAFRAME

AIM

To work with dataframes provided by pandas

INTRODUCTION
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular
data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data
structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame
consists of three principal components, the data, rows, and columns. In the real world, a Pandas
DataFrame will be created by loading the datasets from existing storage, storage can be SQL
Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary,
and from a list of dictionary etc.
Features of DataFrame
 Potentially columns are of different types
 Size – Mutable
 Labeled axes (rows and columns)
 Can perform Arithmetic operations on rows and columns

A pandas DataFrame can be created using the following constructor,

pandas.DataFrame( data, index, columns, dtype, copy)

Name Description
data takes various forms like ndarray, series, map, lists, dict, constants and also
data
another DataFrame.
For the row labels, the Index to be used for the resulting frame is Optional
index
Default np.arange(n) if no index is passed.
For column labels, the optional default syntax is - np.arange(n). This is only
columns
true if no index is passed.
dtype Data type of each column.
This command (or whatever it is) is used for copying of data, if the default is
copy
False.

Create DataFrame
A pandas DataFrame can be created using various inputs.

 Lists
import pandas as pd
nested_list = [[1,2,3],[10,20,30],[100,200,300]]
df = pd.DataFrame(nested_list, columns= ['A','B','C'])
 dictionary
import pandas as pd
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
 Series
import pandas as pd
letter = pd.Series(['A', 'B', 'C', 'D', 'E'],name='Name')
df = letter.to_frame()
 Numpy ndarrays
import pandas as pd
import numpy as np
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),columns=['a', 'b', 'c'])
 Another DataFrame
old_df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B']})
new_df = old_df[['team']].copy()
print(new_df)

DataFrame Methods:

Function Description
index() Method returns index (row labels) of the DataFrame
insert() Insert column into DataFrame at specified location.
set_index() Set the DataFrame index (row labels) using one or more existing
columns or arrays (of the correct length).
drop() Drop specified labels from rows or columns.
rename() rename any index, column or row
loc[] Method retrieves rows based on index label
iloc[] Method retrieves rows based on index position
sort_values() Sort by the values along either axis.

PROGRAM

import pandas as pd
dictionaryData = {'Scoville' : [50, 5000, 500000],'Name' : ["Bell pepper", "Espelette pepper",
"Chocolate habanero"],'Feeling' : ["Not even spicy", "Uncomfortable", "Practically ate pepper
spray"]}
dataFrame = pd.DataFrame(dictionaryData)
dataFrame
#change the indexing
dataFrame2 = dataFrame.set_index('Scoville')
dataFrame2

# Location by label
print(dataFrame.loc[2])

# Location by index
print(dataFrame.iloc[1])

#access specific values for elements

print(dataFrame.loc[2, 'Name']) Chocolate habanero

dataFrame.loc[50] = [10000, 'Serrano pepper', 'I

regret this'] #Adding rows
dataFrame

#removing rows
dataFrame.drop(1, inplace=True)
dataFrame

#rename rows
dataFrame.rename({0:"First", 2:"Second"},
inplace=True)
dataFrame

#add duplicate rows

dataFrame.loc[3] = [60.000, "Bird's eye chili", "4th stage of grief"]
dataFrame.loc[4] = [60.000, "Bird's eye chili", "4th
stage of grief"]
#remove duplicate rows
dataFrame.drop_duplicates(inplace=True)
dataFrame

#add column
dataFrame['Color'] = ['Green', 'Bright Red',
'Brown','white']
dataFrame
#sort values
newdf = dataFrame.sort_values(by='Name')
newdf

RESULT
Thus the program using Pandas dataframe have been successfully executed and
verified.

EX. NO: 4 DESCRIPTIVE ANALYTICS ON THE IRIS DATA SET

AIM
Reading data from text files, Excel and the web and exploring various commands for
doing descriptive analytics on the Iris data set.
INTRODUCTION
Exploratory Data Analysis (EDA) is a technique to analyze data using some visual
Techniques. With this technique, we can get detailed information about the statistical summary
of the data. We will also be able to deal with the duplicates values, outliers, and also see some
trends or patterns present in the dataset.
IRIS DATASET
It includes three iris species with 50 samples each, as well as some properties about
each flower. One flower species is linearly separable from the other two, but the other two are
not linearly separable from each other.
The columns in this dataset are:
 Id  PetalLengthCm
 SepalLengthCm  PetalWidthCm
 SepalWidthCm  Species

Getting Information about the Dataset

shape use the shape parameter to get the shape of the dataset.
info() To get the columns and their data types.
describe() quick statistical summary of the dataset using the describe () method.
The describe () function applies basic statistical computations on the
dataset like extreme values, count of data points standard deviation, etc.
Any missing value or NaN value is automatically skipped. describe()
function gives a good picture of the distribution of data.
isnull() check if our data contains any missing values or not. Missing values can
occur when no information is provided for one or more items or for a
whole unit
drop_duplicates() see if our dataset contains any duplicates or not. It helps in removing
duplicates from the data frame.
head() Use head() method of the data frame to show the first five rows of the
data.

The pandas module allows us to load DataFrames from external files and work on them.
The dataset can be a text file, excel file, web reference or a CSV file.

1. Reading data from text files, and exploring various commands for doing descriptive
analytics on the Iris data set.
The following functions can be used to read a table of fixed-width formatted lines into
DataFrame. A comma-separated values (csv) file is returned as two-dimensional data structure
with labeled axes.
a. read_csv()
b. read_table()
c. read_fwf()

import pandas as pd
from pandas.api.types import is_numeric_dtype
df = pd.read_table("E:/data/IRISTestdata.txt",delimiter = ',')
#df = pd.read_csv("E:/data/IRISTestdata.txt")
#df = pd.read_fwf("E:/data/IRISTestdata.txt", delimiter=",")
print(type(df))
print(df)
#set column names of DataFrame
df.columns = ["sepal_length","sepal_width","petal_length","petal_width","target"]
print(df)
df.head()
print(df.shape)
print(df.info())
df.target.replace({"Iris-setosa":"setosa","Iris-versicolor":"versicolor","Iris-
virginica":"virginica"},inplace=True)
print(df)
#the unique values of a column
df.target.unique()
print("EDA")
print(df.describe())
#find the pairwise correlation of all columns
df.corr()
#return the count of unique occurrences in this column
df.target.value_counts()
for col in df.columns:
if is_numeric_dtype(df[col]):
print('%s:' % (col))
print('\t Mean = %.2f' % df[col].mean())
print('\t Standard deviation = %.2f' % df[col].std())
print('\t Minimum = %.2f' % df[col].min())
print('\t Maximum = %.2f' % df[col].max())

2. Reading data from web, and exploring various commands for doing descriptive
analytics on the Iris data set.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# download iris data and read it into a dataframe
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
df = pd.read_csv(url, names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
'class'])
print(df)
print(df.columns)
#checks for any missing values in the parameters.
print(df.isnull().values.any())
print("visually by plotting a graph by no. of data points
of each class label.")
# First plot
plt.plot(df["class"])
plt.xlabel("No. of data points")
plt.show()
# Second plot
plt.hist(df["class"],color="green")
plt.show()
#Describing numeric columns
print(df.describe())
#This shows the actual duplicate rows
df[df.duplicated()]
# Total no of duplicates in the dataset
df.duplicated().sum()
#calculate median of each species
print(df.groupby('class').median())

3. Reading data from Excel files, and exploring various commands for doing descriptive
analytics on the Iris data set.
import pandas as pd
df = pd.read_excel('E:/data/IrisData.xls')
print (df)
print(df.head())
# it will print the rows from 10 to 20.
print(df[10:21])
print(type(df))
#print the total number of rows and columns of that particular dataset.
print(df.shape)
print(df.info())
print(df.info(verbose = False))
print(df.describe())
print(df.isnull().sum()
print(df.drop_duplicates(subset ="Species_name"))
#it will count number of times a particular species has occurred
print(df.value_counts("Species_name"))
print(df.sample(10))
#Displaying the number of columns and names of the columns.
print(df.columns)
#Extracting minimum and maximum from a column.
min_data=df["Sepal_length"].min()
max_data=df["Sepal_length"].max()
print("Minimum:",min_data, "\nMaximum:", max_data)
#Displaying only specific columns.
specific_data=df[["Id","Species_name"]]
print(specific_data.head(60))
print("Calculating sum, mean and mode of a particular column.")
sum_data = df["Sepal_length"].sum()
mean_data = df["Sepal_length"].mean()
median_data =
df["Sepal_length"].median()
print("Sum:",sum_data, "\nMean:", mean_data, "\nMedian:",median_data)

RESULT
Thus the various commands for doing descriptive analytics on the Iris data set
are explored.
EX. NO: 5a UNIVARIATE ANALYSIS

AIM

Analysis the various univariate functions like Frequency, Mean, Median, Mode,
Variance, Standard Deviation, Skewness and Kurtosis on dataset like Pima Indian diabetes
dataset.

Univariate analysis:

The main objective of the univariate analysis is to describe the data in order to find out
the patterns in the data. It is the simplest form to analyze data. Uni means ‘one’ and this means
that the data has only one kind of variable.
That is the data contains just one variable and does not have to deal with the relationship
of a cause and effect. Its major purpose is to describe; It takes data, summarizes that data and
finds patterns in the data. patterns found in univariate data include central tendency (mean,
mode and median)and dispersion: range , variance, maximum, minimum, quartiles (including
the interquartile range), and standard deviation, Skewness and Kurtosis.

 The mean() function can be used to calculate mean/average of a given list of numbers.
 The median() method calculates the median (middle value) of the given data set.
 The mode of a set of data values is the value that appears most often.
 The var() method calculates the variance for each column.
 Standard deviation std() is a number that describes how spread out the values are.
 The skew() method calculates the skew for each column. Skewness refers to a
distortion or asymmetry that deviates from the symmetrical bell curve, or normal
distribution, in a set of data.

o Skewness = 0: Normally distributed.

o Skewness > 0: More weight in the left tail of the distribution.
o Skewness < 0: More weight in the right tail of the distribution.

 Kurtosis() is also a statistical term and an important characteristic of frequency

distribution. It determines whether a distribution is heavy-tailed in respect of the normal
distribution. It provides information about the shape of a frequency distribution.
o kurtosis for normal distribution is equal to 3.
o For a distribution having kurtosis < 3: It is called playkurtic.
o For a distribution having kurtosis > 3, It is called leptokurtic and it signifies that
it tries to produce more outliers rather than the normal distribution.

PIMA INDIANS DIABETES DATASET

This dataset is originally from the National Institute of Diabetes and Digestive and
Kidney Diseases. The objective is to predict based on diagnostic measurements whether a
patient has diabetes.
Several constraints were placed on the selection of these instances from a larger
database. In particular, all patients here are females at least 21 years old of Pima Indian
heritage.
 Pregnancies: Number of times pregnant
 Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
 BloodPressure: Diastolic blood pressure (mm Hg)
 SkinThickness: Triceps skin fold thickness (mm)
 Insulin: 2-Hour serum insulin (mu U/ml)
 BMI: Body mass index (weight in kg/(height in m)^2)
 DiabetesPedigreeFunction: Diabetes pedigree function
 Age: Age (years)
 Outcome: Class variable (0 or 1)

PROGRAM
import pandas as pd
from scipy.stats import kurtosis
df = pd.read_csv("https://raw.githubusercontent.com/npradaschnor/Pima-Indians-Diabetes-
Dataset/master/diabetes.csv")
print (df)
df.dtypes # Get data type for each attribute
df.isnull().sum() # Check for missing values
# Check the average of features grouped by Outcome (Diabetes)
df.groupby('Outcome').mean()
# Getting only the women with Glucose value > 0
df_glucose = df.loc[df['Glucose'] != 0]
#Calculating Mean
df_glucose['Glucose'].mean()
df_glucose.groupby('Outcome').mean()
#Calculating Median
df_glucose['Glucose'].median()
#Calculating Mode
df_glucose = df.loc[df['BloodPressure'] != 0]
df_glucose['BloodPressure'].mode()
df_glucose = df.loc[df['Glucose'] != 0]
df_glucose['Glucose'].mode()
#Calculating Variance
df_glucose['Glucose'].var()
#Calculating Standard Deviation
df_glucose['Glucose'].std()
#Calculating Skew
print("Mean Age: ",df['Age'].mean())
print("Age Skewness: ",df['Age'].skew())
#Calculating Kurtosis
kurtosis(df['Age'],axis=0,bias=True)

OUTPUT

Result

Thus the various univariate functions like Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis on dataset Pima Indian diabetes are successfully
executed.
Ex. No: 5b BIVARIATE ANALYSIS

Aim
To perform bivariate analysis both linear and logistic regression modeling on UCI Pima
Indian diabetes dataset.

Bivariate Analysis

It is a methodical statistical technique applied to a pair of variables (features/ attributes)

of data to determine the empirical relationship between them. In order words, it is meant to
determine any concurrent relations.
There are three common ways to perform bivariate analysis:
1. Scatterplot
2. Correlation Coefficients
3. Simple Linear Regression

1. Scatterplot
A scatterplot is a type of data display that shows the relationship between two numerical
variables

import pandas as pd
import matplotlib.pyplot as plt
pima = pd.read_csv("https://raw.githubusercontent.com/npradaschnor/Pima-Indians-
Diabetes-Dataset/master/diabetes.csv")
# Diabetes Outcome
res = pima.loc[pima.Outcome==1,:]
# Pregnancies, Age and Diabetes relation
res.plot.scatter('Pregnancies', 'Age')

2. Correlation Coefficient
The correlation coefficient is a statistical measure of the strength of the
relationship between the relative movements of two variables. The values range between -
1.0 and 1.0. Correlation of -1.0 shows a perfect negative correlation, while a correlation of
1.0 shows a perfect positive correlation. A correlation of 0.0 shows no linear relationship
between the movement of the two variables.
import numpy as np
# Check only the women that have all the values of BMI, Glucose, Insulin & Blood Pressure
pima_all = pima.loc[(pima['BMI'] != 0) & (pima['Insulin'] != 0) & (pima['BloodPressure'] !=
0) & (pima['Glucose'] != 0)]
age = pima_all['Age']
preg = pima_all['Pregnancies']
# Correlation between the different characteristics. Closer to 1 better is the correlation
corr = np.corrcoef(age,preg)
print(corr)

3. Simple Linear Regression

Simple linear regression is a statistical method that we can use to find a relationship
between two variables and make predictions. The two variables used are typically denoted as
y and x. The independent variable, or the variable used to predict the dependent variable is
denoted as x. The dependent variable, or the outcome/output, is denoted as y. A simple linear
regression model will produce a line of best fit, or the regression line.

Program
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
pima = pd.read_csv("https://raw.githubusercontent.com/npradaschnor/Pima-Indians-
Diabetes-Dataset/master/diabetes.csv")
x=pima['Age']
y=pima['Pregnancies']
x_train=x[0:699];x_test=x[700:]
y_train=y[0:699];y_test=y[700:]
# x must have one column
x_train = np.array(x_train).reshape(-1,1)
x_test = np.array(x_test).reshape(-1,1)
# create a linear regression model and fit it
model = LinearRegression().fit(x_train, y_train)
# obtain the coefficient of determination
r_sq = model.score(x_train, y_train)
print("Correlation Coeff : ",r_sq)
# pass the regressor as the argument and get the corresponding predicted response
y_pred = model.predict(x_test)
# Plot outputs
plt.scatter(x_test, y_test, color="black")
plt.plot(x_test, y_pred, color="blue", linewidth=2)
plt.show()
Output

Logistic Regression
It is a Machine Learning classification algorithm that is used to predict the probability
of a categorical dependent variable. In logistic regression, the dependent variable is a binary
variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In other words,
the logistic regression model predicts P(Y=1) as a function of X. Logistic regression requires
quite large sample sizes.

Program
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
pima = pd.read_csv("https://raw.githubusercontent.com/npradaschnor/Pima-Indians
Diabetes-Dataset/master/diabetes.csv")
x = pima.drop(['Outcome'], axis = 1)
y = pima.loc[:,"Outcome"].values
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33, random_state =
123)
logreg = linear_model.LogisticRegression(max_iter=150)
# Fit
logreg.fit(x_train,y_train)
# Predict
predicted = logreg.predict(x_test)
print("Test accuracy: {} ".format(logreg.score(x_test, y_test)))
cf_matrix = confusion_matrix(y_test,predicted)
sns.heatmap(cf_matrix/np.sum(cf_matrix), annot=True,fmt='.2%', cmap='winter_r')
plt.show()
Output

RESULT
Thus the bivariate analysis on UCI Pima Indian diabetes dataset was performed and
verified successfully.

Ex. No: 5c MULTIPLE REGRESSION ANALYSIS

AIM
To perform multiple regression analysis on UCI Pima Indian diabetes dataset.

Multiple Regression
Multiple regression is like linear regression, but with more than one independent value.
When one variable/column in a dataset is not sufficient to create a good model and make more
accurate predictions, we’ll use a multiple linear regression model instead of a simple linear
regression model.
The line equation for the multiple linear regression model is:
y = β0 + β1X1 + β2X2 + β3X3 + .... + βpXp + e
Adding more variables isn’t always helpful because the model may ‘over-fit,’ and it’ll
be too complicated. The trained model doesn’t generalize with the new data. It only works on
the trained data. We have to select the appropriate variables to build the best model. This
process of selecting variables is called Feature selection.
PROGRAM
import pandas as pd
from sklearn import linear_model
df = pd.read_csv ('E:\data\diabetes.csv')
feature_columns = ['Glucose', 'Age']
target_column = 'Pregnancies'
x = df[feature_columns]
y = df[target_column]
regr = linear_model.LinearRegression()
regr.fit(x, y)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
predictedage = regr.predict([[150,20]])
print("Predicted Age : ",predictedage)

OUTPUT
Intercept:
-1.185667850875595
Coefficients:
[-0.00158363 0.15710088]
Predicted Age : [1.71880494]

RESULT
Thus the multiple regression analysis on UCI Pima Indian diabetes dataset was
performed and verified.

Ex.No: 6a NORMAL CURVE

AIM

To plot a normal curve on UCI-Iris data set.

NORMAL DISTRIBUTION
It is a probability function used in statistics that tells about how the data values are
distributed. It is the most important probability distribution function used in statistics because
of its advantages in real case scenarios (Eg., the height of the population, shoe size, etc)
It is generally observed that data distribution is normal when there is a random
collection of data from independent sources. The graph produced after plotting the value of
the variable on x-axis and count of the value on y-axis is bell-shaped curve graph. The graph
signifies that the peak point is the mean of the data set and half of the values of data set lie
on the left side of the mean and other half lies on the right part of the mean telling about the
distribution of the values. The graph is symmetric distribution.
The probability density function of normal or Gaussian distribution is given by:

where, x is the random variable, μ is the mean, and σ is standard deviation.

Standard Normal Distribution is the normal distribution with mean as 0 and
standard deviation as 1.
PROGRAM
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
df = pd.read_csv("E:\data\iris.csv") # Plot between -10 and 10 with .001 steps.
x_axis = np.arange(-10, 10, 0.001)
# Calculating mean and standard deviation
mean = df["SepalLengthCm"].mean()
sd = df.loc[:,"SepalWidthCm"].std()
plt.xlabel('X', fontsize='12')
plt.ylabel('P(X)', fontsize='12')
plt.plot(x_axis, norm.pdf(x_axis, mean, sd))
plt.show()

OUTPUT

RESULT
Thus the plotting of normal curve on Iris dataset was successfully performed and
verified.

Ex.No: 6b DENSITY & CONTOUR PLOT

Aim
To explore the plotting of density and contour plot on UCI Iris dataset.

Density Plotting
Density Plot is a type of data visualization tool. It is a variation of the histogram that
uses ‘kernel smoothing’ while plotting the values. It is a continuous and smooth version of a
histogram inferred from a data.
Density plots uses Kernel Density Estimation (so they are also known as Kernel
density estimation plots or KDE) which is a probability density function. The region of plot
with a higher peak is the region with maximum data points residing between those values.

PROGRAM

import seaborn as sns

import matplotlib.pyplot as plt
sns.set(style="darkgrid")
df = sns.load_dataset('iris')
# plotting both distibutions on the same figure
fig = sns.kdeplot(df['sepal_width'], shade=True, color="r")
fig = sns.kdeplot(df['sepal_length'], shade=True, color="b")
plt.show()

OUTPUT

Contour Plots
Contour plots also called level plots are a tool for doing multivariate analysis and
visualizing 3-D plots in 2-D space using contours or color-coded regions. It is used to display
the relationship between two independent variables and a dependent variable.
There are 3 Matplotlib functions:
 plt.contour for contour plots
 plt.contourf for filled contour plots
 plt.imshow for showing images
The matplotlib.pyplot.contour() are usually useful when Z = f(X, Y) i.e Z changes as a
function of input X and Y. Contour plots are widely used to visualize density, altitudes or
heights of the mountain as well as in the meteorological department. Contour plots require
three continuous variables.
Syntax:

matplotlib.pyplot.contour([X, Y, ] Z, [levels], **kwargs)

where,

X, Y: 2-D numpy arrays with same shape as Z or 1-D arrays such that len(X)==M and
len(Y)==N (where M and N are rows and columns of Z)
Z: The height values over which the contour is drawn. Shape is (M, N)
levels: Determines the number and positions of the contour lines / regions.
A contour plot typically contains the following elements:
1. X and Y-axes denoting values of two continuous independent variables.
2. Coloured bands representing ranges of the continuous dependent (Z) variable.
3. Contour lines connecting points that have the same dependent value.
The x and y values represent positions on the plot, and the z values will be represented by
the contour levels. The way to prepare such data is to use the np.meshgrid function, which
builds two-dimensional grids from one-dimensional arrays. Eg.,

PROGRAM
import numpy as np
import matplotlib.pyplot as plt
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) *
np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 4, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
# to fill the contour plot use the plt.contourf()
plt.contourf(X, Y, Z, 20, cmap= plt.cm.RdGy)
plt.colorbar()
# combining contour plots and image plots
contours = plt.contour(X, Y, Z, 3, colors='black')
plt.clabel(contours, inline=True, fontsize=8)
plt.imshow(Z, extent=[0, 5, 0, 4], origin='lower',
cmap=plt.cm.cividis, alpha=0.2)
plt.colorbar()

RESULT
Thus the plotting of density and contour plot was explored and verified successfully.
Ex.No: 6c CORRELATION AND SCATTER PLOTS

AIM
To explore the plotting of correlation and scatter plots on UCI Iris data set.

INTRODUCTION
Scatter plots and correlation matrices are both tools used in statistics and data analysis
to visually and quantitatively understand the relationships between variables in a dataset. They
help to explore patterns, associations, and potential dependencies between different variables.
Scatter Plot
A scatter plot is a graphical representation that displays individual data points as dots on a two-
dimensional plane. Each data point is represented by a dot at the intersection of its
corresponding values on the two axes. Scatter plots are particularly useful when you want to
visualize the relationship between two continuous variables. One variable is plotted on the
horizontal (x) axis, and the other variable is plotted on the vertical (y) axis.
The pattern of the dots in a scatter plot can indicate the nature of the relationship
between the two variables:
Positive Correlation: If the dots roughly form a line that slopes upwards from left to right, this
indicates a positive correlation. It means that as one variable increases, the other tends to
increase as well.
Negative Correlation: If the dots roughly form a line that slopes downwards from left to right,
this indicates a negative correlation. It means that as one variable increases, the other tends to
decrease.
No Correlation: If the dots are scattered randomly without any clear pattern, this suggests that
there is little to no correlation between the variables.
Correlation Matrix
A correlation matrix is a tabular representation of the correlation coefficients between
multiple variables in a dataset. It provides a numerical measure of the strength and direction of
the linear relationship between pairs of variables. The correlation coefficient, often denoted as
"r," ranges from -1 to 1.
- A correlation coefficient of 1 indicates a perfect positive correlation.
- A correlation coefficient of -1 indicates a perfect negative correlation.
- A correlation coefficient close to 0 indicates little to no correlation.
Both scatter plots and correlation matrices play a crucial role in exploratory data
analysis, hypothesis testing, and model building, as they provide insights into the relationships
between variables, which can inform further analysis and decision-making.
PROGRAM
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Reading the CSV file
df = pd.read_csv("E:\data\iris.csv")
# Check for missing values
df.isnull().sum()
# Scatter plot to find relationship between variables
sns.scatterplot(x='PetalLengthCm', y='PetalWidthCm',hue='Species', data=df)
# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()
# to find & visualize the pairwise correlation of all columns
sns.heatmap(df.corr(method='pearson').drop(['Id'], axis=1).drop(['Id'], axis=0),annot = True)
plt.show()

OUTPUT

RESULT
Thus the plotting of correlation and scatter plots on UCI Iris data set was explored and
verified successfully.

Ex.No: 6c HISTOGRAM

AIM
To explore the plotting of histogram on UCI Iris dataset.
INTRODUCTION
A histogram is a great tool for quickly assessing a probability distribution that is
intuitively understood by almost any audience. It is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval. A histogram is a
mapping of bins (intervals) to frequencies. Histograms allow seeing the distribution of data for
various columns. It can be used for uni as well as bi-variate analysis.
PROGRAM
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Reading the CSV file
df = pd.read_csv("E:\data\iris.csv")
fig, axes = plt.subplots(2, 2, figsize=(10,10))
axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df['SepalLengthCm'], bins=7)
axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df['SepalWidthCm'], bins=5)
axes[1,0].set_title("Petal Length")
axes[1,0].hist(df['PetalLengthCm'], bins=6)
axes[1,1].set_title("Petal Width")
axes[1,1].hist(df['PetalWidthCm'], bins=6)

RESULT
Thus the plotting of histogram on UCI Iris dataset was explored and verified
successfully.

Ex. No: 6e THREE DIMENSIONAL PLOTTING

AIM
To explore the three dimensional plotting on UCI Iris dataset.
INTRODUCTION
A 3D plot is a visualization that extends the concept of a traditional plot into a three-
dimensional space. It allows to explore the relationships between three continuous variables
simultaneously. Plotly Express is a Python library that provides an easy and interactive way to
create various types of plots, including 3D scatter plots.
Install Plotly Express (if not already installed):
pip install plotly
Additional customization options in Plotly Express allow you to change the appearance
of the plot, such as colors, markers, and titles. Plotly Express automatically creates an
interactive plot with various features, including the ability to zoom, rotate, and pan. You can
hover over data points to see their exact values and labels.
The 3D scatter plot created using Plotly Express is particularly useful when you want
to visualize the relationships between three variables that can't be easily represented in a
traditional 2D scatter plot.

PROGRAM
import pandas as pd
import plotly.express as px
df = pd.read_csv("E:\data\iris.csv")
px.scatter_3d(df, x="PetalLengthCm", y="PetalWidthCm", z="SepalLengthCm",\
size="SepalWidthCm",color="Species", color_discrete_map = {"Iris-setosa": "skyblue",\
"Iris-virginica": "violet", "Iris-versicolor":"pink"}).show()

OUTPUT

RESULT

Thus the three dimensional plotting on UCI Iris dataset was verified successfully.
Ex. No: 7 VISUALIZING GEOGRAPHIC DATA WITH BASEMAP

AIM
To get the insight of basemap in visualizing geographic data.

INTRODUCTION
Basemap is a toolkit under the Python visualization library Matplotlib under the
namespace mpl_toolkits. Its main function is to draw 2D maps, which are important for
visualizing spatial data.

Installation
Using Conda type,
$ conda install basemap
or in Jupyter Notebook type,
!python -m pip install basemap

PROGRAM

import os
os.environ["PROJ_LIB"] = "C:\\Utilities\\Python\\Anaconda\\Library\\share";
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import requests
from csv import DictReader
import pandas as pd

DATA_URL = 'http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_month.csv'
print("Downloading", DATA_URL)
resp = requests.get(DATA_URL)
quakes = list(DictReader(resp.text.splitlines()))

print(quakes[:2])
quakes1=pd.DataFrame(quakes)
print(quakes1.head(2))

qLngs = [float(q['longitude']) for q in quakes]

qLats = [float(q['latitude']) for q in quakes]
qMags = [2**float(q['mag']) for q in quakes]
plt.figure(figsize=(14, 8))

earth = Basemap()
earth.drawmapboundary(fill_color='skyblue')
earth.fillcontinents(color='coral', lake_color='skyblue')
earth.drawcoastlines(color='#555566', linewidth=1)
plt.scatter(qLngs, qLats, qMags, c='red', alpha=0.5, zorder=10)
plt.xlabel("HISTORY OF EARTHQUAKES")

OUTPUT

RESULT
Thus visualizing of geographic data with basemap was verified successfully.

Exploring Python Data Packages
No ratings yet
Exploring Python Data Packages
77 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
Fds Record
No ratings yet
Fds Record
69 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
FDS Record Last
No ratings yet
FDS Record Last
61 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
18 pages
Fods Final Done
No ratings yet
Fods Final Done
67 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
31 pages
CS3362 Data Science Laboratory Alok Kumar
No ratings yet
CS3362 Data Science Laboratory Alok Kumar
50 pages
Cse-Fds Lab Manual
No ratings yet
Cse-Fds Lab Manual
74 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
Python for Scientific Computing: NumPy & Pandas
No ratings yet
Python for Scientific Computing: NumPy & Pandas
7 pages
Key Python Libraries for Numerical Computing
100% (1)
Key Python Libraries for Numerical Computing
41 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Ty B Tech - Bda - Ai315 - Lab Manual
No ratings yet
Ty B Tech - Bda - Ai315 - Lab Manual
52 pages
DS Lab Manual
No ratings yet
DS Lab Manual
113 pages
Data Science with Python: NumPy, Pandas, SciPy
No ratings yet
Data Science with Python: NumPy, Pandas, SciPy
48 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
Grace Python Numpy MB
No ratings yet
Grace Python Numpy MB
56 pages
FDS Lab Manual For CSE 1
No ratings yet
FDS Lab Manual For CSE 1
86 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
63 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
56 pages
Unit 2
No ratings yet
Unit 2
25 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Comprehensive NumPy Guide for Python
No ratings yet
Comprehensive NumPy Guide for Python
30 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
New Chat
No ratings yet
New Chat
30 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
24 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
58 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
Fds Lab Manual PDF
No ratings yet
Fds Lab Manual PDF
80 pages
Data Science Lab - Ii - Cse - CS3361
No ratings yet
Data Science Lab - Ii - Cse - CS3361
55 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
42 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Scipy, Matplotlib, Pandas
No ratings yet
Scipy, Matplotlib, Pandas
16 pages
Lab 2 DWM
No ratings yet
Lab 2 DWM
13 pages
FUNDAMENTALS OF DATA SCIENCE LAB - Jupyter Notebook
No ratings yet
FUNDAMENTALS OF DATA SCIENCE LAB - Jupyter Notebook
48 pages
Advanced NumPy Broadcasting Guide
No ratings yet
Advanced NumPy Broadcasting Guide
21 pages
IRJET Scientific Computing and Data Anal
No ratings yet
IRJET Scientific Computing and Data Anal
13 pages
Unit Vi
No ratings yet
Unit Vi
60 pages
Python NumPy
No ratings yet
Python NumPy
3 pages
Data Sceince Lab Manual
No ratings yet
Data Sceince Lab Manual
64 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
23CS302 - Dslab - Experiment 1
No ratings yet
23CS302 - Dslab - Experiment 1
5 pages
Fds Labmanual
No ratings yet
Fds Labmanual
57 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
Lab Description File
No ratings yet
Lab Description File
11 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
74 pages
Numpy
No ratings yet
Numpy
4 pages
Using Python For Data Science: Dr. D. Kothandaraman Associate Professor, SCOPE, VIT-AP
No ratings yet
Using Python For Data Science: Dr. D. Kothandaraman Associate Professor, SCOPE, VIT-AP
41 pages
FDS Lab: Python Data Science Packages
No ratings yet
FDS Lab: Python Data Science Packages
18 pages
Progress DPSPR
No ratings yet
Progress DPSPR
306 pages
Google - Professional Cloud Architect - Page 4 - Examprepper
No ratings yet
Google - Professional Cloud Architect - Page 4 - Examprepper
4 pages
Essential Access Exercises
No ratings yet
Essential Access Exercises
15 pages
Alation 1 Pager
No ratings yet
Alation 1 Pager
2 pages
Interactive Application Development Guide
No ratings yet
Interactive Application Development Guide
61 pages
MySQL Queries
No ratings yet
MySQL Queries
2 pages
Workbook BLII 230
No ratings yet
Workbook BLII 230
1 page
Quiz 3.4
No ratings yet
Quiz 3.4
1 page
S4 HANA Q &A - 80 Set2
No ratings yet
S4 HANA Q &A - 80 Set2
33 pages
Join Operation
No ratings yet
Join Operation
28 pages
CSE-3421 Test #1: "Design"
No ratings yet
CSE-3421 Test #1: "Design"
10 pages
Report GLS280
No ratings yet
Report GLS280
54 pages
QP Xii Ip Hy 2024-25
No ratings yet
QP Xii Ip Hy 2024-25
9 pages
Create Custom
No ratings yet
Create Custom
3 pages
Weather Forecasting Project Report
No ratings yet
Weather Forecasting Project Report
42 pages
Chapter 3
No ratings yet
Chapter 3
44 pages
SAP ABAP Ultimate Beginner Guide
100% (10)
SAP ABAP Ultimate Beginner Guide
196 pages
E 115 Excel Assignment Guidelines
No ratings yet
E 115 Excel Assignment Guidelines
2 pages
DBMS Question Bank - Unit 1
No ratings yet
DBMS Question Bank - Unit 1
26 pages
ANT350 Security Analytics and Observability With Amazon OpenSearch Service
No ratings yet
ANT350 Security Analytics and Observability With Amazon OpenSearch Service
56 pages
Ids Unit 1
No ratings yet
Ids Unit 1
25 pages
Active MQin Action CH05
No ratings yet
Active MQin Action CH05
7 pages
Oracle EBS R12.2 Online Patching Guide
No ratings yet
Oracle EBS R12.2 Online Patching Guide
14 pages
Fourier Analysis For Demand Forecasting in A Fashion Company
No ratings yet
Fourier Analysis For Demand Forecasting in A Fashion Company
11 pages
Raditya Herwando: Senior Software Engineer Profile
No ratings yet
Raditya Herwando: Senior Software Engineer Profile
2 pages
AWS Certified Solutions Architect - Associate SAA-C03 Exam - Free Exam Q&as, Page 3 - ExamTopics
No ratings yet
AWS Certified Solutions Architect - Associate SAA-C03 Exam - Free Exam Q&as, Page 3 - ExamTopics
18 pages
Redis in Depth Study Notes
No ratings yet
Redis in Depth Study Notes
5 pages
Unit - I Introduction To Data Analytics
No ratings yet
Unit - I Introduction To Data Analytics
89 pages
Difference Between Information and Data
No ratings yet
Difference Between Information and Data
4 pages
RAC DOC - New
No ratings yet
RAC DOC - New
64 pages

Foundation of Data Science Lab Manual

Uploaded by

Foundation of Data Science Lab Manual

Uploaded by

Ex.

No: 1 INSTALLATION OF PYTHON PACKAGES

Ex. No: 2 NUMPY ARRAYS

5. Copy & View

6. Array Shape & Reshaping

10. Searching Arrays

11. Sorting Arrays

To work with dataframes provided by pandas

A pandas DataFrame can be created using the following constructor,

#access specific values for elements

dataFrame.loc[50] = [10000, 'Serrano pepper', 'I

#add duplicate rows

EX. NO: 4 DESCRIPTIVE ANALYTICS ON THE IRIS DATA SET

Getting Information about the Dataset

o Skewness = 0: Normally distributed.

 Kurtosis() is also a statistical term and an important characteristic of frequency

PIMA INDIANS DIABETES DATASET

It is a methodical statistical technique applied to a pair of variables (features/ attributes)

3. Simple Linear Regression

Ex. No: 5c MULTIPLE REGRESSION ANALYSIS

Ex.No: 6a NORMAL CURVE

To plot a normal curve on UCI-Iris data set.

where, x is the random variable, μ is the mean, and σ is standard deviation.

Ex.No: 6b DENSITY & CONTOUR PLOT

import seaborn as sns

matplotlib.pyplot.contour([X, Y, ] Z, [levels], **kwargs)

Ex. No: 6e THREE DIMENSIONAL PLOTTING

qLngs = [float(q['longitude']) for q in quakes]

You might also like