Machine Learning
Machine Learning
OF INDIA)
Department of CSE
(Emerging Technologies)
(DATA SCIENCE,IOT,CYBER SECURITY)
B.TECH(R-22 Regulation)
(III YEAR – II SEM)
(2024-2025)
“To be at the forefront of Emerging Technologies and to evolve as a Centre of Excellence in Research,
Learning and Consultancy to foster the students into globally competent professionals useful to the
Society.”
Mission
The department of CSE (Emerging Technologies) is committed to:
To offer highest Professional and Academic Standards in terms of Personal growth and satisfaction.
Make the society as the hub of emerging technologies and thereby capture opportunities
in new age technologies.
To create a benchmark in the areas of Research, Education and Public Outreach.
To provide students a platform where independent learning and scientific study are encouraged
with emphasis on latest engineering techniques.
QUALITY POLICY
To pursue continual improvement of teaching learning process of Undergraduate and
PostGraduateprogramsin Engineering &Managementvigorously.
To provide state of art infrastructure and expertise to impart the quality education and research
environment to students for a complete learning experiences.
To offer quality relevant and cost effective programmes to produce engineers as per requirements
of the industry need.
9 Build KNN Classification model for a given dataset. Vary the number of k
values as follows and compare the results:
i. 1
ii. 3
iii. 5
iv. 7
v. 11
10 Implement Support Vector Machine for a dataset and compare the accuracy
by applying the following kernel functions:
i. Linear
ii. Polynomial
iii. RBF
Week 1:
a)Implementation of Python Basic Libraries such as Math, Numpy and Scipy
Theory/Description:
Python Libraries
There are a lot of reasons why Python is popular among developers and one of them is that it
has an amazingly large collection of libraries that users can work with. In this Python Library,
we will discuss Python Standard library and different libraries offered by Python Programming
Language: scipy, numpy, etc.
We know that a module is a file with some Python code, and a package is a directory for sub
packages and modules. A Python library is a reusable chunk of code that you may want to
includein your programs/ projects. Here, a ‗library‘ loosely describes a collection of core
modules. Essentially, then, a library is a collection of modules. A package is a library that can
be installed using a package manager like npm.
To display a list of all available modules, use the following command in the Python console:
>>> help('modules')
List of important Python Libraries
o Python Libraries for Data Collection
Beautiful Soup
Scrapy
Selenium
o Python Libraries for Data Cleaning and Manipulation
Pandas
PyOD
NumPy
Scipy
Spacy
o Python Libraries for Data Visualization
Matplotlib
Seaborn
Bokeh
o Python Libraries for Modeling
Scikit-learn
TensorFlow
PyTorch
The math module is a standard module in Python and is always available. To use
mathematical functions under this module, you have to import the module using import
math. It gives access tothe underlying C library functions. This module does not support
complex datatypes. The cmath module is the complex counterpart.
Program-1
Program-2
Program-3
Program-4
Program-5
Program-6
NumPy is an open source library available in Python that aids in mathematical, scientific,
engineering, and data science programming. NumPy is an incredible library to perform
mathematical and statisticaloperations. It works perfectly well for multi-dimensional arrays
and matrices multiplication
For any scientific project, NumPy is the tool to know. It has been built to work with the
N- dimensional array, linear algebra, random number, Fourier transform, etc. It can be
integrated toC/C++ and Fortran.
NumPy is a programming language that deals with multi-dimensional arrays and matrices.
On top ofthe arrays and matrices, NumPy supports a large number of mathematical
operations.
NumPy is memory efficiency, meaning it can handle the vast amount of data more accessible
than anyother library. Besides, NumPy is very convenient to work with, especially for matrix
multiplication and reshaping. On top of that, NumPy is fast. In fact, TensorFlow and Scikit
learn to use NumPy arrayto compute the matrix multiplication in the back end.
◻ It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positiveintegers.
◻ In NumPy dimensions are called axes. The number of axes is rank.
◻ NumPy’s array class is called ndarray. It is also known by the alias array.
We use python numpy array instead of a list because of the below three reasons:
1. Less Memory
2. Fast
3. Convenient
Numpy Functions
Numpy arrays carry attributes around with them. The most important
ones are:ndim: The number of axes or rank of the array
shape: A tuple containing the length in each
dimensionsize: The total number of elements
Program-1
Program-2
Built-in Methods
Many standard numerical functions are available as methods out of the box:
Program-3
◻ SciPy contains varieties of sub packages which help to solve the most common issue
related to Scientific Computation.
◻ SciPy is the most used Scientific library only second to GNU Scientific Library for
C/C++ or Matlab's.
◻ Easy to use and understand as well as fast computational power.
◻ It can operate on an array of NumPy library.
SciPy:
1. SciPy is built in top of the NumPy
2. SciPy is a fully-featured version of Linear Algebra while Numpy contains only a few features.
3. Most new Data Science features are available in Scipy rather than Numpy.
Linear algebra routine accepts two-dimensional array object and output is also a two-dimensional
array.
Now let's do some test with scipy.linalg,
Calculating determinant of a two-dimensional matrix,
Program-1
◻ The most common problem in linear algebra is eigenvalues and eigenvector which can
beeasily solved using eig() function.
◻ Now lets we find the Eigenvalue of (X) and correspond eigenvector of a two-
dimensionalsquare matrix.
Program-2
Exercise programs:
1. consider a list datatype then reshape it into 2d,3d matrix using numpy
2. Genrate random matrices using numpy
3. Find the determinant of a matrix using scipy
4. Find eigenvalue and eigenvector of a matrix using scipy
Week 2:
Implementation of Python Libraries for ML application such as Pandas and Matplotlib.
Pandas Library
The primary two components of pandas are the Series and DataFrame.
A Series is essentially a column, and a DataFrame is a multi-dimensional table made
up of acollection of Series.
DataFrames and Series are quite similar in that many operations that you can do
with oneyou can do with the other, such as filling in null values and calculating
the mean.
With CSV files all you need is a single line to load in the data:
df =
pd.read_csv('purchases.csv')df
Another fast and useful attribute is .shape, which outputs just a tuple of (rows, columns):
movies_df.shape
Note that .shape has no parentheses and is a simple tuple of format (rows, columns). So
we have1000 rows and 11 columns in our movies DataFrame.
You'll be going to .shape a lot when cleaning and transforming data. For example, you
might filtersome rows based on some criteria and then want to know quickly how many
rows were removed.
Program-1
We haven't defined an index in our example, but we see two columns in our output: The right column
contains our data, whereas the left column contains the index. Pandas created a default index starting with 0
going to 5, which is the length of the data minus 1.
Program-2
We can directly access the index and the values of our Series S:
Program-3
So far our Series have not been very different to ndarrays of Numpy. This changes, as soon as we start
defining Series objects with individual indices:
Program-4
Program-5
A big advantage to NumPy arrays is obvious from the previous example: We can use arbitrary indices.
If we add two series with the same indices, we get a new series with the same index and the correponding
values will be added:
OUTPUT:
apples 37
oranges 46
cherries 83
pears 42
dtype: int64
sum of S: 115
Program-6
The indices do not have to be the same for the Series addition. The index will be the "union" of both indices.
If an index doesn't occur in both Series, the value for this Series will be NaN:
OUTPUT:
cherries 83.0
oranges 46.0
peaches NaN
pears 42.0
raspberries NaN
dtype: float64
Program-7
In principle, the indices can be completely different, as in the following example. We have two indices. One is
the Turkish translation of the English fruit names:
fruits = ['apples', 'oranges', 'cherries', 'pears']
OUTPUT:
apples NaN
armut NaN
cherries NaN
elma NaN
kiraz NaN
oranges NaN
pears NaN
portakal NaN
dtype: float64
Program-8
Indexing
It's possible to access single values of a Series.
print(S['apples'])
OUTPUT:
20
Matplotlib Library
◻ plot(x-axis values, y-axis values) — plots a simple line graph with x-axis values
against y-axis values
◻ show() — displays the graph
◻ title(―stringǁ) — set the title of the plot as specified by the string
◻ xlabel(―stringǁ) — set the label for x-axis as specified by the string
◻ ylabel(―stringǁ) — set the label for y-axis as specified by the string
◻ figure() — used to control a figure level attributes
◻ subplot(nrows, ncols, index) — Add a subplot to the current figure
◻ suptitle(―stringǁ) — It adds a common title to the figure specified by the string
◻ subplots(nrows, ncols, figsize) — a convenient way to create subplots, in a single call.
It returns a tuple of a figure and number of axes.
◻ set_title(―stringǁ) — an axes level method used to set the title of subplots in a figure
◻ bar(categorical variables, values, color) — used to create vertical bar graphs
◻ barh(categorical variables, values, color) — used to create horizontal bar graphs
◻ legend(loc) — used to make legend of the graph
◻ xticks(index, categorical variables) — Get or set the current tick locations and labels
of the x-axis
◻ pie(value, categorical variables) — used to create a pie chart
◻ hist(values, number of bins) — used to create a histogram
◻ xlim(start value, end value) — used to set the limit of values of the x-axis
◻ ylim(start value, end value) — used to set the limit of values of the y-axis
◻ scatter(x-axis values, y-axis values) — plots a scatter plot with x-axis values against
y-axis values
◻ axes() — adds an axes to the current figure
◻ set_xlabel(―stringǁ) — axes level method used to set the x-label of the plot specified
as a string
◻ set_ylabel(―stringǁ) — axes level method used to set the y-label of the plot specified
as a string
◻ scatter3D(x-axis values, y-axis values) — plots a three-dimensional scatter plot with
x-axis values against y-axis values
◻ plot3D(x-axis values, y-axis values) — plots a three-dimensional line graph with x-
axis values against y-axis values
Here we import Matplotlib‘s Pyplot module and Numpy library as most of the data thatwe
will be working with will be in the form of arrays only.
Program-1
Program-2
We pass two arrays as our input arguments to Pyplot‘s plot() method and use show() method to
invoke the required plot. Here note that the first array appears on the x-axis andsecond array appears
on the y-axis of the plot. Now that our first plot is ready, let us add the title, and name x-axis and y
axis using methods title(), xlabel() and ylabel() respectively.
Program-3
We can also specify the size of the figure using method figure()and passing the valuesas a tuple of
the length of rows and columns to the argument figsize
Program-4
With every X and Y argument, you can also pass an optional third argument in the formof a string which
indicates the colour and line type of the plot. The default format is b- which means a solid blue line. In the
figure below we use go which means green circles.Likewise, we can make many such combinations to format
our plot.
Program-1
Method-I
Program-2
Method-II:
Method-III:
b) Write a python program to compute Mean, Median, Mode, Variance, Standard Deviation using
Datasets
Measures of spread
These functions calculate a measure of how much the population or sample tends to
deviate fromthe typical or average values.
pstdev() Population standard deviation of data.
pvariance() Population variance of data.
stdev() Sample standard deviation of data.
variance() Sample variance of data.
Program-1
Program-2
Program-3
Program-4
Program-5
c) Write a python program to compute reshaping the data, Filtering the data , merging the data and
handling the missing values in datasets.
Program-2
Method:II
Assigning the data:
Program-3
Program-1
Program-2
Program-3
Merge data:
Merge operation is used to merge raw data and into the desired format.
Syntax:
pd.merge( data_frame1,data_frame2, on="field ")
Program-4
Program-5
Program-6
Program-2
In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of
Boolean values which are True for NaN values.
Program-7
Program-3
In order to check null values in Pandas Dataframe, we use notnull() function this function return dataframe of
Boolean values which are False for NaN values.
Program-4
Program-5
Program-6
Program-7
Method-I
Drop Columns with Missing Values
Program-8
Method-II
fillna() manages and let the user replace NaN values with some value of their own
Program-9
Program-10
Program-11
Program-12
Program-13
Code:
missing_value = ["n/a","na","--"]
data1=pd.read_csv(r'E:\mldatasets\Machine_Learning_Data_Preprocessing_Python-
master\Sample_real_estate_data.csv', na_values = missing_value)
df = data1
Program-1
Reshaping the data:
Method-I
Program:
Write a python program to loading csv dataset files using Pandas library functions.
Program:
a. Importing data(CSV)
b. Importing data(EXCEL)
Excersice:
Demonstrate various data pre-processing techniques for a given dataset.
Program:
Week 4:
Implement Dimensionality reduction using Principle Component Analysis (PCA) method.
Program:
Observations:
- x1 and x2 do not seem correlated
- x1 seems very correlated with both x3 and x4
- x2 seems somewhat correlated with both x3 and x4
- x3 and x4 seem very correlated
Week 5:
Develop Decision Tree Classification model for a given dataset and use it to classify a new sample.
Program:
outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes'
Week 6:
Consider a dataset use Random Forest to predict the output class vary the number of trees as follows
and compare the results. i) 20 ii)50 iii)100 iv)200 v)500
Week 7:
Write a python program to implement Simple Linear Regression Models and plot the graph.
Program:
Week 8:
Write a python program to implement Logistic Regression Model for a given dataset.
Program:
Excersice:
Implement Naive Bayes classification in python.
Program:
Week 9:
Build KNN Classification model for a given dataset.
Program:
Week-10
Implement Support Vector Machine for a dataset.
Week-11
Program: