0% found this document useful (0 votes)
16 views20 pages

Unit 4. Machine learning with Python

The document discusses the importance of choosing Python as a programming language for machine learning and data science, emphasizing its minimalist nature and extensive libraries. It outlines the stages of machine learning, the types of learning (supervised and unsupervised), and the significance of datasets in training algorithms. Additionally, it introduces the use of the Pandas library in Python for data manipulation and provides insights into neural networks and their operational principles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views20 pages

Unit 4. Machine learning with Python

The document discusses the importance of choosing Python as a programming language for machine learning and data science, emphasizing its minimalist nature and extensive libraries. It outlines the stages of machine learning, the types of learning (supervised and unsupervised), and the significance of datasets in training algorithms. Additionally, it introduces the use of the Pandas library in Python for data manipulation and provides insights into neural networks and their operational principles.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

CUAUHTÉMOC UNIVERSITY

AGUASCALIENTES CAMPUS
DISTANCE EDUCATION

Matter
Programming in Python
Teacher
Dr. Iván Castillo Zúñiga.
[email protected]

EXCELLENT PROFESSIONALS, BETTER HUMAN BEINGS


Introduction.
For a successful journey of learning English Machine Learning, it is necessary to choose the
suitable coding language from the beginning, as your choice will determine your future. In
At this step, you must think strategically and correctly organize priorities and not waste time.
on unnecessary things.

Python is a perfect choice for programmers to focus on jumping into the field of
machine learning and data science. It is a minimalist and intuitive language with a line of
complete library (called frameworks) that significantly reduces the time required for
obtain your first results.

Machine learning is learning based on experience. For example, it is like a


a person who learns to play chess through observation while others play. In this way,
computers can be programmed by providing information for which they are
trained, gaining the ability to identify elements or their characteristics with high
probability.

First of all, you should know that there are several stages of machine learning:

• Data collection.
• Data classification.
• Data analysis.
• Development of algorithms.
• Generated verification algorithm.
• The use of an algorithm to draw conclusions.

To search for patterns, various algorithms are used, which are divided into two groups:

1. Machine learning.
2. Supervised learning.

With machine learning, your machine only receives a set of input data.
Subsequently, the machine is ready to determine the relationship between the input data and
any other hypothetical data. Unlike supervised learning, where the machine has
with some verification data for learning, independent unsupervised learning
It implies that the computer itself will find patterns and relationships between different sets of
Data. Unsupervised learning can be divided into clustering and association.

Supervised learning involves the computer's ability to recognize elements.


based on the provided samples. The computer studies it and develops the ability to
recognize new data based on this data. For example, you can train your computer to
filter spam messages based on previously received information.

Some supervised learning algorithms include:

a. Decision trees.
b. Support Vector Machines (SVM).
c. Bayes Classifier.
d. K-nearest neighbors.
e. Linear regression.
f. Neural networks.

Dr. Iván Castillo Zúñiga 1


Programming in the Python language.
Unit 4.
Machine learning with Python.
6.1 Datasets and machine learning.
6.1.1 Dataset.
6.1.2 Relationship of datasets in machine learning.
6.2 Machine learning (supervised learning).
6.2.1 Multilayer Perceptron Neural Network.
6.2.2 Support Vector Machines (SVM).
6.2.3 Decision trees.
6.2.4.Random forests.
6.2.5 AdaBoost Enhancement (Boosting).
6.2.6 KNN - nearest neighbor.
6.2.7 Bayes.

4.1 Datasets and machine learning.

4.1.1 Dataset.

Dataset.

The term dataset itself is a foreign term, an anglicism, that we have incorporated into
our language as one more term in Spanish-speaking countries. Its translation to our
language would be a dataset and is a collection of data usually tabulated.

A dataset corresponds to the contents of a single database table.


data or a single data matrix of statistics, where each column of the table represents
a particular variable, and each row represents a specific member of the set of
data we are dealing with. In a dataset, we have all the values that
puede tener cada una de las variables, como por ejemplo la altura y el peso de un objeto, que
correspond to each member of the dataset. Each of these values is known
with the name of data. The dataset may include data for one or more members.
according to its number of rows.

The dataset also includes the relationships between the tables that contain the data. If we
In the context of Big Data, we understand by dataset those sets of data.
so large that traditional data processing applications cannot handle them
process due to the large amount of data contained in the table or matrix.

We could define a dataset as a collection or representation of data residing in


memory with a coherent relational programming model and independently be
whatever the origin of the data it contains.

One of the main characteristics of datasets is that they already have a structure.

Dr. Iván Castillo Zúñiga 2


Programming in the Python language.
Dataframe.

This type of data organization is usually used when conducting a study.


statistical about the objects of a sample, the information and the data of the sample are
They are organized into a dataframe. They are organized in a data sheet, where each row
corresponds to an object in the sample and each column to a variable. This characteristic of
data organization is the same as in datasets.

If we talk about the structure of a dataframe, it is very similar to that of a matrix. But in a
matrix only accepts numeric values, unlike the matrix, in a dataframe
Alphanumeric data can also be included in its content.

A DataFrame is a two-dimensional data structure in which data can be stored.


data of different types (such as characters, integers, floating point values, factors and more)
in columns. A DataFrame always has an index (starting at 0). The index refers to the
position of an element in the data structure.

4.1.2 Relationship of datasets with machine learning.

One of the most difficult problems to solve in machine learning


learning) has nothing to do with learning algorithms (neural networks, trees
of decision, support vector machines, Bayes, k-nearest neighbors, is the problem
to obtain the correct data in the correct format (dataset).

Obtaining the correct data means gathering or identifying the data that correlates with
the results you want to predict; that is, data that contains a signal about events
that interest them. The data must be aligned with the problem they are trying to
resolver. Cat pictures are not very useful when you are building a system of
facial recognition. A data scientist must verify that the data is aligned with the
problem that seeks to solve. If you do not have the correct data, your efforts to build a
artificial intelligence (AI) solutions must return to the data collection stage.

The correct final format for machine learning or deep learning is


generally a tensor or a multidimensional matrix. Therefore, the data channels
created for learning will generally convert all data, whether images,
video, sound, voice, text or time series, in vectors and tensors that can be applied
the operations of linear algebra. Those data often need to be normalized,
standardize and clean themselves to increase their usefulness, and those are all the steps in the
machine learning where extraction processes are applied,
Transformation and Load of English Extraction, Transformation and Load (ETL).
Deeplearning4j offers the DataVec ETL tool to perform those tasks of
data preprocessing.

Deep learning and machine learning in general, need a good set.


for training to function correctly. Gather and build the set of
training, a considerable body of known data, requires time and knowledge

Dr. Iván Castillo Zúñiga 3


Programming in the Python language.
specific to the domain of where and how to collect relevant information. The set of
training acts as the benchmark against which the networks are trained
learning. That is what they learn to reconstruct before the data that does not unleash.
they have seen before.

At this stage, knowledgeable humans need to find the correct raw data.
and transform them into a numerical representation that the learning algorithm can
to understand, a dataset (matrix) for supervised learning or a tensor
(multidimensional matrix) for deep learning. Build a set
training is, in a sense, pre-pre-training.

Training sets that require a lot of time or experience can be useful.


as a patented advantage in the world of data science and problem solving.
The nature of the experience mainly lies in telling your algorithm what it
Imports by selecting what is included in the training set.

It is about telling a story, through the initial data that you select, that will guide your
deep learning networks as they extract important features, both in
the training set as in the raw data that has been created for
to study.

To create a useful training set, you must understand the problem you are
resolving; that is, what do you want your learning algorithms to pay attention to
supervised or deep learning networks, what results do you want to predict.

Training and testing datasets.

Machine learning generally works with two datasets:


training and testing. Where a body of data should be sampled randomly
bigger.

The first set he uses is the training set, the largest of the sets.
Ejecutar un conjunto de entrenamiento a través de un algoritmo de aprendizaje supervisado
or a neural network in deep learning teaches the network how to weigh different
characteristics, adjusting the coefficients according to their probability of minimizing the
errors in their results.

These coefficients, also known as parameters, will be contained in matrices.


unidimensional or tensors (multidimensional matrices) and together they are called a model,
because they encode a model of the data on which they are trained. These are the main conclusions
important that you will gain by training a supervised learning algorithm or a network
neural in deep learning.

The second set is your test set. It works as a seal of approval and not
he uses it until the end. After training and optimizing his data, he tests his algorithm
supervised or neural network (deep learning) against this final random sampling. The

Dr. Iván Castillo Zúñiga 4


Programming in the Python language.
results produced must validate that your algorithm accurately recognizes the solution
of the problem, or at least recognizes a percentage of them.

If you do not obtain accurate predictions, go back to the training set, look at the
hyperparameters that were used to adjust the network or the machine learning algorithms,
as well as the quality of their data and their preprocessing techniques.

4.1.3 Dataset in Python.

We can automate the process of manipulating data with Python. It's worth spending time.
writing the code that does these tasks since once it is written, we can use it
over and over again in different datasets that use a similar format. This makes it
our easily reproducible methods. It is also easy to share our code
with our colleagues and they can replicate the same analysis.

It is recommended that the dataset be in the same directory as


Where is the program located?

Pandas in Python.

One of the best options for working with tabular data in Python is to use Python
Data Analysis Library (alias Pandas). The Pandas library provides data structures that
They generate high-quality graphics with matplotlib and integrate well with others.
libraries that use NumPy arrays (which is another Python library).

Python does not load all available libraries by default. You have to use the
import our code to use the functions of the library. To import
a library is used with the syntax import libraryName. If we also want to add
a nickname to shorten the commands can be added as nicknameToUse. A
An example is to import the pandas library using its common alias pd as in the
next example:

import pandas as pd

Reading data in CSV using Pandas

CSV stands for Comma-Separated Values, values separated by commas, and it is a


common way to store data. Other symbols can be used, you may encounter
values separated by tabs, by semicolons or by blank spaces. It's easy
replace one separator with another, to use your application. The first line of the file
it usually contains the headers that indicate what is in each column. CSV (and others
Separators) make it easy to share data and can be imported and exported from
various programs, including Microsoft Excel.

Dr. Iván Castillo Zúñiga 5


Programming in the Python language.
Instruction to read a dataset
In the following example, we will use the cybercrime dataset. Which was obtained through
of Extraction, Transformation, and Loading (ETL) techniques with the ADVI architecture for the
data preprocessing, using 1326 pages downloaded from the Web, with a vocabulary
of 107 words. The pandas library is imported, and it is assigned to the object pd, later it is
define the variable data that will be used to load the dataframe and manipulate the instructions
About it. In Fig. 1, the cybercrime dataframe is shown.

Figure 1. Loading a dataframe in Python.

The dataset file is located in the resources area, with the name
"cibercrimen.csv", which you can download from the UCA platform. This dataset will be
used in neural network algorithms and support vector machines.

Exploring the data


The type function is used to see what kind of variable or object data is. And indeed
as shown below, it is a DataFrame type object.

The size function is used to display the total size of the DataFrame, as shown at
continuation:

The shape function is used to show how many rows and columns the DataFrame has, as
it is shown below:

The dataset contains 1326 rows and 107 columns.

Dr. Iván Castillo Zúñiga 6


Programming in the Python language.
The ndim function is used to show the dimensions that the DataFrame has, as
show below:

The columns function is used to display the columns or attributes that the DataFrame has.
as shown below:

The axes function is used to show the rows and columns of the DataFrame.
as shown below

Table 1 describes the functions used to obtain statistical data on how it is.
formed a DataFrame, and Table 2 describes the methods for manipulating a DataFrame.

Table 1. Functions to obtain statistics from the Dataframe.

Dr. Iván Castillo Zúñiga 7


Programming in the Python language.
4.2 Machine learning (supervised learning).

In the field of machine learning, it is established on two main pillars.


called supervised learning and unsupervised learning. Some people also
they consider a new field of study, 'deep learning', that is separate from the
common question of supervised or unsupervised learning.

Supervised learning occurs when a computer is presented with examples of


inputs and their desired outputs. The goal is to learn about general formulas that map
inputs to outputs.

In contrast, unsupervised learning occurs when no labels are given and it remains to
parte del algoritmo encontrar la estructura en su entrada. El aprendizaje no supervisado
It can be a goal in itself when we only need to uncover hidden patterns.

Deep learning is a new field of study inspired by structure


And the function of the human brain is likewise based on artificial neural networks.
instead of just statistical concepts. Deep learning can be used in both
approaches, supervised and unsupervised.

In the following objectives, we will analyze some of the learning algorithms.


supervised machine (Neural Networks, Support Vector Machines (SVM), Bayes,
KNN - Nearest Neighbor, Decision Trees, Random Forests, and AdaBoost), which
will be used to calculate the percentage of accuracy for detecting vocabulary of
Cybercrime on websites.

4.2.1 Neural network.

Neural Networks (commonly referred to as NN) are a paradigm of


learning and automatic processing inspired by the way the system works
nervous system of humans. It is a system of interconnection of neurons that work together
to produce an output stimulus.

The first models of neural networks date back to 1943 by the neurologists Warren.
McCulloch and Walter Pitts. Years later, in 1949, Donald Hebb developed his ideas about
neural learning, reflected in the 'Hebb rule'. In 1958, Rosenblatt
the simple Perceptron was developed, and in 1960, Widrow and Hoff developed the Adaline, which was
the first real industrial application.

In the following years, research was reduced due to the lack of learning models.
and the study by Minsky and Papert on the limitations of the Perceptron. However, in the
In the 80s, neural networks resurged thanks to the development of the Hopfield network, and especially,
to the back-propagation learning algorithm designed by Rumelhart
by McClelland in 1986 who was applied in the development of multilayer perceptrons
(Hernández, Ramírez & Ferri, 2004).

Dr. Iván Castillo Zúñiga 8


Programming in the Python language.
Operation of a Neural Network.

A neural network is composed of units called neurons. Each neuron receives a


a series of inputs through interconnections and outputs a result. This output is given by
three functions:

1. A propagation function (also known as an excitation function), which for


In general, it consists of the sum of each input multiplied by the weight of its
interconnection (net value). If the weight is positive, the connection is referred to as excitatory;
if it is negative, it is called inhibitory.
2. An activation function, which modifies the previous one. It may not exist, being in this.
in case the output is the same propagation function.
3. A transfer function, which is applied to the value returned by the function of
activation. It is used to limit the output of the neuron and is generally given
for the interpretation we want to give to such outputs. Some of the most
used are the sigmoid function (to obtain values in the interval (0,1)) and the
hyperbolic tangent (to obtain values in the interval (-1,1)).

Figure 2 shows the structure of a Network.


neural (Simple Perceptron). Composed
for the input values, the weights
synaptic (dendrites), the adder, the
activation function and the output. Train
a Neural network means discovering the
weights of the neurons.

Figure 2. Neural network (Simple Perceptron).


Simple perceptron.

The simple Perceptron is the algorithm used in the research, which is applied to cases.
linearly separable. The main characteristic of the simple Perceptron is that the sum
weighted input signals are compared with an average value of the targets
(Simple Perceptron) or threshold (Adaline) to determine the output of the neuron. When the
sum is equal to or greater than the threshold, the output is 1, otherwise it is 0.

Neural network process (simple Perceptron).

1. The network starts in a random state. The weights between neurons have values
small and random (between -1 and 1).
2. Select an input vector, x, from the set of examples of
training.
3. The activation propagates forward through the weights in the network to calculate the
output O = w.x.
4. If O = t (That is, if the network's output is correct) return to step 2.
5. Otherwise, the exchange of pesos is carried out according to the following expression:
wi= n xi t o ) where n is a small positive number known as the coefficient
learning. Go back to step 2.

Dr. Iván Castillo Zúñiga 9


Programming in the Python language.
Multilayer perceptron in Python.
The Neural Network algorithm used in Python is the Multilayer Perceptron.
Multi-Layer Perceptron (MLP) (Scikit-learn.org, 2019) is a supervised algorithm that
learn a function training on a dataset, where m is the number
of dimensions for input, and o is the number of dimensions for the output. Given a
set of characteristics and an objective, this can learn an approximator
of a nonlinear function for classification or regression. It is different from logistic regression, as
that between the input layer and the output layer, there may be one or more nonlinear layers, called
hidden layers. Fig. 4 shows an MLP with one hidden layer and scalar output, Fig. 5
show the program.

Figure 4. Multilayer perceptron with a hidden layer with scalar output.


Neural Network Algorithm in Python.
Libraries
# ********************************************************
Library for the weather
import time
Library for machine learning algorithms
import sklearn
Specifications for using a Neural Network
We import train_test_split to divide the training and testing data
from sklearn.model_selection import train_test_split
Specifications for using the Multilayer Perceptron Neural Network
from sklearn.neural_network import MLPClassifier
Library for building the confusion matrix
from sklearn.metrics import confusion_matrix
Library to ignore warnings
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
Library to load the dataset or dataframe
import pandas as pd
import seaborn as sns
Library for using confusion matrix plotting
import matplotlib.pyplot as plt
Algorithm specification to be used

print ("----------------------------------------------------------------------------------------------------------------------------------------------------------")
MODEL OF A NEURAL NETWORK CLASSIFICATION (for detecting Cybercrime vocabulary)
print ("----------------------------------------------------------------------------------------------------------------------------------------------------------")

Dr. Iván Castillo Zúñiga 10


Programming in the Python language.
Process of the Multilayer Perceptron Neural Network Algorithm
# ************************************************************
Start of the countdown timer
inicio_tiempo=time.time()
Load the dataset
dataset = 'cibercrimen.csv'
df = pd.read_csv(dataset)
Show the dimension of the dataset

Dataset Dimension:
Records:
Columns:

print("Predictor variables")
arrangements = df[df.columns[:-1]].to_numpy() # Specifies to take predictor variables
print (arrayx)

print("Variable to predict")
arrangement = df[df.columns[-1]].as_matrix() # Specifies to take the variable to predict (last position)
print (arrayy)
Data model specification for training
X_train, X_test, y_train, y_test = train_test_split(array_x, array_y, test_size = 0.3)

Training:
Tests: ,len(X_test), "records (30%)"
# Variable red loads the Neural Network model, parameters (iterations and hidden layers in the network)
The model is adjusted until an optimal solution is reached
red=MLPClassifier(max_iter=10000, hidden_layer_sizes=(4))
Set the number of records for training and for testing in the model.
red.fit(X_train, y_train)
Check the learning percentage of the algorithm
Learning of the algorithm in the training: {0:.2f}
.format (red.score(X_test,y_test)) ,"%"

Data model specification for classification (tests)


model = MLPClassifier()
Adjust the model to the training data
Adjust the data model to the training process
print(model.fit(X_train, y_train))
Training confusion matrix
y_predicted = model.predict(X_test)
cm = confusion_matrix(y_test, y_predicted)
Confusion matrix of tests
y_pred = red.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)
The time of the process ends.
tiempo=(time.time()-inicio_tiempo)
Confusion matrix of tests

CONFUSION MATRIX
sns.heatmap(cnf_matrix.T, square=True, annot=True, fmt='d', cbar=True)
True class
Predicted class
Confusion Matrix
plt.show()
print("Description")
True Negatives:
True Positives:
False Positives: ,cnf_matrix[1,0])
False Negatives: ,cnf_matrix[0,1]
Visualizing the results

CLASSIFICATION RESULTS
print ("------------------------------------------------------------------")
percentage = (1 - (y_test != y_pred).sum() / y_test.shape[0])

Dr. Iván Castillo Zúñiga 11


Programming in the Python language.
Print("Precision percentage: {0:.1f}")
.format(percentage * 100), "%")
Classification errors: {1} errors, out of a total of {0} cases
.format(y_test.shape[0],(y_test != y_pred).sum())
Execution time is: {0:.2f}
.format(time),"sec"
print ("------------------------------------------------------------------")

Figure 5. Results of the Neural Network algorithm in Python.

Dr. Iván Castillo Zúñiga 12


Programming in the Python language.
4.2.2 Support Vector Machine.

Support Vector Machines are a set of learning algorithms


supervised developed by Vladimir Vapnik and his team at AT&T labs.

These methods are properly related to classification and regression problems.


Given a set of training examples (samples), we can label the classes.
and train an SVM to build a model that predicts the class of a new sample.
Intuitively, an MSV is a model that represents sample points in space,
separating the classes by the widest space possible. When the new samples are
they are matched with this model, depending on their proximity they can be
classified into one class or another.

The MSV with linear separation has the


purpose of separating one class from another class,
locating a line between them called
Vector, seeking maximum separation
forming groups, where the examples used
to define the line are known as
support vectors, hence its name
Support Vector Machines. Fig. 6,
show an example of MSV with separation
linear.

Figure 6. Support Vector Machine with linear separation.

The linear technique uses a linear classifier with the widest margin, as seen in
the graph (B) of Fig. 7, (where the maximum margin is the best solution), reducing to
minimum, the error where there could exist an infinite number of possible hyperplanes, such as
It is appreciated in graph (A) of the same figure. The margin is defined as the width that limits
the data, in this way the positive class is separated from the negative class.

Figure 7. Linear technique with the widest margin (MSV).

Dr. Iván Castillo Zúñiga 13


Programming in the Python language.
When the problem is not linearly separable, the input data is transformed to another
dimension where the training table is separable, through the definition of a kernel
(core). The distances between dimensions are beyond the lines.

Non-linearly separable techniques.

• Polynomial.
• Gaussian.
• Sigmoidal.
• Inverse multiquadratic.

Support Vector Machines in Python

The Support Vector Machines in Python are a set of learning methods


supervised used for classification, regression, and outlier detection.

The program is based on the algorithm libsvm (sklearn.svm.SVC). The fitting time is
it scales at least quadratically with the number of samples and can be impractical more
beyond tens of thousands of samples. For datasets that are too large, consider
use the algorithms sklearn.linear_model.LinearSVC or sklearn.linear_model.SGDClassifier
in his place possibly after of a transformer
sklearn.kernel_approximation.Nystroem (Scikit-learn.org, 2019).

Multiclass support is handled according to a one-vs-one scheme. To obtain


details about the precise mathematical formulation of the provided kernel functions
and how gamma, coef0, and degree affect each other, refer to the corresponding section in the
narrative documentation: Functions of the nucleus.

Support Vector Machine Algorithm.


Libraries
# ********************************************************
Library for the weather
import time
Library for machine learning algorithms
import sklearn
Library for generating models with Support Vector Machines
from sklearn.svm import SVC
We import train_test_split to divide the training and test data.
from sklearn.model_selection import train_test_split
Library to build the confusion matrix
from sklearn.metrics import confusion_matrix
Library to ignore warnings
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
Library to load the dataset or dataframe
import pandas as pd
import seaborn as sns
Library to use graphing of the confusion matrix
import matplotlib.pyplot as plt
Algorithm specification to be used

print ("----------------------------------------------------------------------------------------")
print (" SUPPORT VECTOR MACHINES CLASSIFICATION MODEL

Dr. Iván Castillo Zúñiga 14


Programming in the Python language.
print ("----------------------------------------------------------------------------------------")

print (" DETECTION OF CYBERCRIME VOCABULARY


Process of the Support Vector Machines algorithm
# ********************************************************
Starts the timer
inicio_tiempo=time.time()
Load the dataset
dataset = 'cibercrimen.csv'
df = pd.read_csv(dataset)

Dataset Dimension:
Records:
Columns:

print("Predictor variables")
arrangements = df[df.columns[:-1]].as_matrix() # Specifies to take predictor variables
print (arrayx)

Print ("Variable to predict")


arrangement = df[df.columns[-1]].as_matrix() # Specifies to take the variable to predict (last position)
print (arrayy)
Data model specification for training
X_train, X_test, y_train, y_test = train_test_split(arrayx, arrayy, test_size = 0.3)

Training: 0 records (70%)


Tests: len(X_test) records (30%)
Variable model loads the Support Vector Machine model
model = SVC()
Set the number of records for training and testing in the model
model.fit(X_train, y_train)
Verify the learning percentage of the algorithm
Learning of the algorithm in training: {0:.2f}
{:.2f}%

Training confusion matrix


y_pred = model.fit(X_train, y_train).predict(X_test)
Test confusion matrix
cnf_matrix = confusion_matrix(y_test, y_pred)
The process time ends
tiempo=(time.time()-inicio_tiempo)
Confusion matrix of tests

CONFUSION MATRIX
sns.heatmap(cnf_matrix.T, square=True, annot=True, fmt='d', cbar=True)
True class
Predicted class
Confusion Matrix
plt.show()
Description
True Negatives:
True Positives:
False Positives: ,cnf_matrix[1,0])
False Negatives: ,cnf_matrix[0,1]
Optimization (C parameter, it indicates how much you want to avoid misclassifying a data point)
When the value of C is smaller, choose a separating hyperplane with a larger margin.
doing a better job of sorting
model_C = SVC(C=1)
model_C.fit(X_train, y_train)
model_C.score(X_test, y_test)
When the value of C is larger, it chooses a separating hyperplane with a smaller margin.
generating a higher number of classification errors
model_C = SVC(C=10)
model_C.fit(X_train, y_train)
model_C.score(X_test, y_test)

Dr. Iván Castillo Zúñiga 15


Programming in the Python language.
Visualizing the results

CLASSIFICATION RESULTS
print ("-------------------------------------------------------------------------------------------")
percentage = (1 - (y_test != y_pred).sum() / y_test.shape[0])
Precision: {0:.1f}".format(percentage * 100),"%"}
Optimized accuracy (C=10): {0:.1f}% **Optimizes classification by generating
print (" a hyperplane of minimal margin
Classification errors: {1} errors, out of a total of {0} cases.
Execution time is: {0:.2f} seg
print ("-------------------------------------------------------------------------------------------")

Fig. 8 shows the Support Vector Machines program running in language


Python, specifically in Jupyter from Anaconda.

Figure 8. Results of the Support Vector Machine algorithm in Python.

Dr. Iván Castillo Zúñiga 16


Programming in the Python language.
Case study of objective 4.2 Machine Learning (supervised learning).

Implement a real or fictitious case study using supervised learning with a network.
neuronal and support vector machines. The points you should consider in your solution and
in the writing of your report, they are as follows:

1. Define the problem.


Describe the problem.
What is intended to be predicted.
What data is available.
What data is necessary to obtain.

2. Define predictor variables and the variable to be defined.


Explore and understand the data.
Explain each of the predictor variables, why it was selected.
•Explain the variable to predict, on what it is based to assign the value Yes and No.

3. Success metric.
Define an appropriate way to quantify how good the results are.
obtained.

4. Machine Learning.
Data loading process.
Implement and adapt the neural network algorithm.
Implement and adapt the Support Vector Machine algorithm.
Specify the percentage of classification or prediction obtained.

5. Explain the obtained results and how to apply them to the solution.

Note: It is important to mention that, in order to develop the present


activity, you must first understand the material and carry out the
examples provided in this unit, from page 1 to 16.
where the provided Cybercrime dataset should be analyzed,
neural network and support vector machine code
provided, as well as understanding the results that are yielded
those algorithms.

Dr. Iván Castillo Zúñiga 17


Programming in the Python language.
Bibliography

1. Adnan, A., Lee, T. & Prakash, Amit. (2016). 'Elements of Programming Interviews in'
Python. The insider Guide”. Ed. Amazon Services. 458 pág. ISBN: 978-1537713946.
2. Barry, P. (2016). “Head First Python”. 2nd Edition. Boston: ÓReilly. 429 pág. ISBN: 978-
1491919538.
3. Bennett, J. (2019). “Supercharged Python: Take Your Code to the Next Level”. Ed. Addison-
Wesley. 672 pág. ISBN: 978-0135159941.
4. Technology for Development Blog (2018). "Is Python the language of the future?" Downloaded on
September 9 of the website:
https://www.paradigmadigital.com/dev/es-python-el-lenguaje-del-futuro/
5. Buttu, M. (2016). “El Gran libro de Python”. Ed. Marcocombo. 662 pág. ISBN: 978-
8426722904.
6. Danjou, J. (2019). “Serious Python: Black-Belt Advice on Deployment, Scalability, Testing,
and More." Ed. Amazon Digital, Kindle version. 225 pages. ISBN: 1593278780.
7. Deitel, P. (2019). 'Python for Programmers: with Big Data and Artificial Intelligence Case
Studies. Ed. Amazon Services. Kindle version. 640 pages. ASIN: B07PP9Q8MC.
8. DK. (2017). “Coding Projects in Python”. Editorial: DK Publishing (Dorling Kindersley). 224
page. ISBN: 978-1465461889.
9. Garcia, A. (2017). “Numerical Methods for Physics (Python)”. CreateSpace Independent
Publishing Platform; 2nd Ed. Publisher: CreateSpace Independent Publishing Platform. 350
Page ISBN: 978-1548865498.
10. Hinojosa, A.́(2016). “Python paso a paso”. Ed. Ra-Ma. 230 pág. ISBN: 978-8499646114.
11. Kopec, D. (2019). “Classic Computer Science Problems in Python”. Editorial: Manning
Publications. 224 pages. ISBN: 978-1617295980.
12. Librosweb (2018). “Bases de Datos en Python con MySQL”. Descargado el 14 de septiembre
from the website:https://librosweb.es/libro/python/capitulo-12.html
13. Lutz, L. (2018). “Guía paso a paso para aprender programación Python). Ed. Amazon
Services. 246 p. ASIN: B07CSGFB43.
14. Mueller, J. (2018). “Beginning Programming with Python for Dummies”. Publisher: For
Dummies series. 408 pages. ISBN: 978-1118891452.
15. Nunez-Iglesias, J., Van Der Walt, S. W., & Dashnow, H. (2017). “Elegant Scipy: The Art of
Scientific Python. Editorial: O'Reilly Media. 282 pages. ASIN: B074RB2FT2.
16. Pérez, A. (2016). “Python fácil”. Editorial: Alfaomega. 284 pág. ISBN: 978-6076226612
17. Rossum (2017). "The Python Tutorial". Downloaded on September 13 from the website:
http://www.pyrhon.org.ar
18. Scikit-learn.org (2019). "Decision trees (sklearn.tree)". Downloaded on 20
September 2019, from the page:
https://scikit-
learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
19. Scikit-learn.org (2019). Forests Random
(sklearn.ensemble.RandomForestClassifier)". Downloaded on September 20
2019, from the page:
https://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
20. Scikit-learn.org (2019). “K-nn Nearest neighbors”. Downloaded on the 20th of
September 2019, from the page:
https://scikit-
learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

Dr. Iván Castillo Zúñiga 18


Programming in the Python language.
21. Scikit-learn.org (2019). "Perceptron (MPL)". Downloaded on September 20,
2019, from the page:
https://scikit-learn.org/stable/modules/neural_networks_supervised.html
22. Scikit-learn.org (2019). 'sklearn.svm.SVC'. Downloaded on September 20, 2019,
from the page:
https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
23. Urban, M., & Murach, J. (2016). “Murach's Python Programming”. Editorial: Mike Murach
& Associates. 590 pages. ISBN: 978-1890774974.
24. Zelle, J. (2016). “Python Programming: An Introduction to Computer Science”. Editorial:
Franklin Beedle & Associates; 3rd ed. 552 pages. ISBN: 978-1590282755.

Dr. Iván Castillo Zúñiga 19


Programming in the Python language.

You might also like