Data Set - Machine Learning
Data Set - Machine Learning
Machine Learning
observations unseen by the machine during training. Huge data collection and storage technologies
have altered the landscape of scientific data analysis, which includes natural resources, prediction
of floods, astronomy, biology and etc. Machine learning is present in all those examples.
The above representation as given in Table 1, depicts the input as N instances, s(1), s(2), … S(N),
each is an example of the concept to be learned. Each instances provides the input to the machine
learning algorithm, and is categorized by its values mentioned as y, as in the last column. The data
can be understood in more elaborated, as given below.
Four types of data are explained here, as it is often be handled in the process of dataset preparation
or preprocessing. The data types are as given below.
Numerical Data
Categorical Data
Time Series Data
Text Data
Numerical Data
Numerical data is a datatype expressed in numbers. This further classified as continuous and
discontinuous data as in Figure 2.
Categorical Data
Categorical data is a collection of information that is divided into groups. They are further divided
into two types such as ordinal and nominal.
Ordinal Data
Ordinal data has ranking / ordering. Ordinal features are sorted or ordered as in the figure 3.
Size of T-Shirt - S, M, L, XL.
Convert string values into integer as per order like XL > L > M > S.
Nominal Data
Nominal features are not ordered as in figure 4. Nominal data has No ranking / order.
Colourof T-Shirt: Red, Green, Blue.
Assign numeric value to each feature.
0 -> Red, 1 -> Green, 2 -> Blue
Machine Learning
Text Data
Text data usually consists of documents, which can represent words, sentences or even paragraphs.
Usually, digital information can be categorized into two classes. They are structured and
unstructured. Studies have recently revealed that more than 70 percent of all the data available for
corporations today is unstructured. But, structured data fits into a fixed format or data table, what
we discussed in Table 1 above.
Preprocessing is the process of fixing or removing incorrect, corrupted, incorrectly formatted,
duplicated, or incomplete data within a dataset. Depicted in Figure 6. This is mainly focusing on
dealing with missing data and handling categorical data.
Machine Learning
Machine Learning
person is saying. Spell checking, grammar checking, translation are the other applications of
NLP and etc.,
Fault diagnostic: Preventive upkeep of motors and generators and other electro-mechanical
devices, can delay malfunctions. Otherwise the deviceswill interrupt industrial
[Link], typical defects or flaws include misalignment of shaft, mechanical
slackening, defective bearings, and unbalanced pumps. Diagnostic of faults are performed
using machine learning algorithms, which is extremely helpful in this field.
Business intelligence: Business intelligence technologies offer not only historical and current
information but also predictive views of business applications. It is essential for businesses to
be able to comprehend the commercial control of their organization, in term of customer base,
market, supply and resources, and competition. In the absence of data mining, many
businesses may be unable to effectively perform market analysis, compare customer feedback
on similar products, find the strength and weaknesses of their competitors, retain extremely
valuable customers, and arrive at intelligent business decisions.
Summary
In this unit, the concepts of machine learningare discussed along with the different approaches of
machine learning. Each approach is discussed in detail with examples. The differences in each of
the approaches would be better understood. Data set is very important for machine learning.
Hence, it is necessary to understand about the basic data types, which is also explored thoroughly.
This will help to convert or process the obtained data. But, there was also lot of challenges in
processing the data set. This also covered in the name of preprocessing and data cleaning. The
major tasks of preprocessing and the possible ways of data cleaning were also discussed. The
terminology – feature engineering was highlighted as it was related to data cleaning.
Keywords
Dataset
Preprocessing
Datacleaning
Supervised learning
Unsupervised learning
Reinforcement learning
Self Assessment
1. Machine learning approach, which build a model based on sample data, is known as
_______.
A. Supervised
B. Unsupervised
C. Reinforcement
D. None of the above
6. Justify the statement. “Preprocessing is the process of converting raw data into data which
will be suitable for machine learning”.
A. True
B. False
10. _________ is the process of changing the format, structure or values of data.
A. Data integration
B. Data cleaning
Machine Learning
C. Data transformation
D. Data Preprocessing
6. A 7. D 8. A 9. C 10. C
Review Questions
1. Explain the different types of data.
2. Differentiatenominal and ordinal data types.
3. Give examples for categorical data.
4. List out the methods used for filling the missing values.
5. Identify the machine learning algorithms for each machine learning approaches.
Further Readings
MadanGopal, Applied Machine Learning, McGraw Hill Education, India, 2018.
S. N. Sivanandam, S.N. Deepa, Principles Of Soft Computing, Wiley Publications,
Second Edition, 2011.
Rajasekaran, S., Pai, G. A. Vijayalakshmi, Neural Networks, Fuzzy Logic and Genetic
Algorithm Synthesis And Applications, Prentice Hall of India, 2013.
N. P. Padhy, S. P. Simon, Soft Computing With Matlab Programming, Oxford
University Press, 2015.
Web Links
[Link]
[Link]
[Link]
3a5e293a5114
Objectives
1. To understand the online tools used for python such as JupyterLab and Google Colab.
2. To understandthe fundamentals of programming such as Variables, keywords, Data types,
Expression, Statements, Operator and Operator Precedence.
3. To differentiatethe conditional and unconditional statements from simple if, if-else, nested if, for
loop, while loop, break and continue.
4. To understand the use of function and recursion which will bediscussed with examples.
5. To know the packages in python along with their purposes.
Introduction
In this unit, we try to introduce you the very popular programming language called Python. We
know that there are many programming languages such as C, C++, which were already existed and
used for decades. Here, we will try to understand the merits of python language over others.
Moreover, we will be writing a simple python programand try to execute using an online tool.
Programs can be experimented to understandthe conditional, unconditional statements along with
functions and recursion. Function declaration, calling of functions, parameters can be well
understood from the given examples. Let us begin with what is python.
current version of python is 3.9.7. There are many reasons for its popularity. It is readable like an
English statement having simple syntaxes. Python is a general-purpose open source language.
Python is portable language, so that it runs on many Unix variants including Linux and mac OS,
and on Windows. Python is also an interpreted language, interactive language and object-oriented
programming language. Python codes are executed comparatively little faster.
The language offers multiple ready made libraries such as NumPy, SciPy, MatplotLib, Scikit-Learn
and frameworks which will support the initial phase of development. These are all the reasons
made Python very popular among programming community.
Online Tools
Jupyter Notebook and Google Colab are the popular online tools for programming in python. Let
us first discuss the Jupyter. You will get the access in this link: [Link] Once you
have visited the page, it will look like the figure 2 given below.
Now please select the first blue color button “Try Classic Notebook”. The page will be looking like
like the figure 3 given below.
Python Installation
Python softwares and the installation manual can be downloaded from the link
[Link] You will be shown the opt version with respect to your
system configuration and operating system. Latest version will be Python 3.9.7 as shown in the
figure.
Just a click is enough to download. And the procedures are simple for the installation.
Variables
Variable is a name that refers to a value that may be changed in the [Link] is no command
to declare a variable in python.
Datatype
Variables can store different types of data. They are Numeric Data Types, Boolean Data Type, Set
Data Type, Dictionary and Sequence Data Types as shown in the figure. Let us start with Numeric
Data Types, where only the numbers are involved, whichis again divided into three categories such
as integer (without decimals), float (with decimals)and complex [Link] data will be
storing True or False [Link] will be studied in the coming units in detail.
Keywords
There are some predefined and reserved words have special meaning to python. Those words are
called Keywords. These keywords can not be used as name for the variable / identifier, not to be
used as function names. These are otherwise called as system defined variable. There are more 30
keywords used in python. Few keywords are like, and, or, not, if, elif, else, for, while, break, return,
True, False, continue, in, is, import and etc.
Expression
An expression is a combination of values, variables and operators. This can be understood from an
example given in the following figure. There are three variables a, b and c. The variables are known
as operands. Operators are used in between to perform some operations using the operands. In this
case, plus (+) operator, multiplication (*) operators are used. Final result is stored in the variable
result.
Let us have an example for an expression. Let a = 10, b=5, c = 3. What will be the output of the
above expression?The output is 25 as shown in the figure.
Statements
Statements are the instructions given in the source code for [Link] outcome of the program
is depending upon how all the statements are arranged for execution. The statements are executed
in a sequential order starting from the first statement in program. There are three types of
statements in python. They are Assignment statements, Conditional Statements and Looping
Statements, which are discussed below.
Assignment statements
The statements that are used to copy a value into the variable is called assignment statements. The
equal sign (=) is used for copying the value. Hence, the operator (=) is called assignment operator.
The target of an assignment statement is written on the left side of the equal sign (=). The value
what is to be assigned will be in the right side of the equal sign (=).
For example, a = 100 is the assignment statement. Here the value 100 is assigned to the variable a.
And, we have one more example like this. x, y = 50, 100 is also the assignment statement, where the
value 50 is assigned to variable x and the value 100 is assigned to variable y.
Conditional statements
Any statementthat outputs the Boolean value (True / False) is called conditional statement as given
in the figure. Framing of conditions is the key element in controlling the flow of execution. Let us
have an example of conditional statement.
Example 1 : ( a < b )
Looping statements
The looping statement(s) are a statementor a block of statements that are used to execute repeatedly
until a specified condition is satisfied. When the condition is True, it executes and when the
condition is False, it stops the [Link] the execution is not getting stopped, then the looping
statement will become infinite [Link] are different looping statements are available such as
while loop and for loop.
Operator
The operators are used to perform some mathematical operations on the values and the variables.
There are few standard symbols available in python. Let us have a look on list of operators and
their usages. According to their usages, all the operators are grouped in different categories such as
arithmetic operators, Relational Operators and logical operators.
Arithmetic Operators
+ Addition x+y
- Subtraction x-y
// Division (floor) x // y
% Modulus x%y
** Power x ** y
Relational Operators
These operators are used to compare the values or variables. The output of relational operators will
be either True or False.
== Equal to x=y
!= Not equal to x != y
Logical operators
Logical operators are used to combine two or more conditional statements. The operators are
Logical AND, Logical OR and Logical NOT.
or Logical OR x or y
Assignment operators
These are used to assign the values to the variable. This operator was already discussed in the topic
of Assignment Statements. Here, let us have the list of other operators used for assignments
operation.
+= This is simplification of a = a + b a += b
-= This is simplification of a = a - b a -= b
*= This is simplification of a = a * b a *= b
/= This is simplification of a = a / b a /= b
%= This is simplification of a = a % b a %= b
Operator Precedence
It is understood that an expression is having one or more operators and operands having simple or
complex mathematical operations. Two operands are needed for an operator to perform the
specified operation. Hence, some order of preference or priority is required to select the operators
to compute the expression. This is called as operator precedence. It can be understood from a
simple expression as shown in the figure.
In the above figure, the expression first computes the multiplication operator and then the result of
the multiplication is used for the next computation of addition operator. Here, multiplication
operator is having the higher priority than the addition operator. Similarly, there are many other
operators are available as we know. It is important to know their precedence so that we can use
them correctly as per our requirements in the expression. Following figure is trying to give you the
clear picture of precedence; the higher priority starts from top and reaching the lowest priority in
the bottom.
2.3 IF Statement
The flow of execution in a program can be controlled using the proper conditions. Here, is the
simple conditional statement called if statement. This is used to execute a block of statements if the
condition is True and to execute the next statements if the condition is False. Hence, the flow of
execution lies in the decision making as in the figure.
attendance = 90
if (attendance>= 75):
print (“Eligible for final examination”)
Here the variable attendance is having 90. Hence, condition is satisfied and the print statement is
executed successfully. Let us have the same example with the attendance value as 65.
attendance = 65
if (attendance>= 75):
print (“Eligible for final examination”)
What will be the output? Of course, it doesn’t print anything as there are no statements given when
the condition becomes False. This can be solved in the next type of conditional statement.
If the condition (if expression) is False, the control is going to false block and reaches the end.
Similarly, if the condition is false, the control is going to true block and reaches the end. Let us
discuss this using an example.
x=6
y=8
if (x>y):
print (“x is greater than y”)
else:
print (“y is greater than x”)
Here, let the variables x and y has the values 6 and 8 respectively. Now, concentrate onthe if
statement and find out the value of condition. (x > y) ( 6 > 8 ) False. The control goes to the
False Part and outputs “y is greater than x”. In case, if we change the values as x= 8 and y=6. What
will be the output?
x = 86
y = 70
if (x>y):
print (“ x is greater than y”)
else:
print (“y is greater than x”)
yes, we will get the output as “x is greater than y” as the condition becomes True and True Part has
been executed.
On every successful execution of the statement, it goes back to the condition after
increments/decrement the variable and checks the condition still satisfied or not. If satisfied, it
continues to execute the statement again and repeat the same till the condition is failed. Let us have
an example.
for t in range (5):
print (t)
The above example is having the variable t. The initial value will be 0 in this case. The last value
will be 4. Before explaining this, let us take care of what is range function.
This function will create the sequence of numbers in a given range starting from 0 by default. In
this, having range (5) will give values from 0 to [Link] is making the sequence of numbers easy.
Now, let us focus on the output of the for loop statement given above. Here, print (t) is executed
five times.
Output:
0
1
2
3
4
Let me explain you from a simple example [Link] variable used is number having initial value
as 0. Here, the program aims to execute a block of statement till the variable number is not equal to
8, It means it never knows how many times it is going to execute the block of statement. We need to
manage this condition as per our requirements. Here, the task is simple.
Initially, the value of number is 0. The condition (0 not equal to 8) is True. It executes the block of
statements first time. Now, the value of number is increased by 1 and it becomes number = 1. Now,
it checks the condition again. The condition (1 not equal to 8) is True. It executes the block of
statements second time and it goes on till the condition is True. Stops otherwise.
Break statement
This statement is used to stop the current execution. There is no need to give any condition for this
break statement. This statement is used whenever you need to stop or whenever you find any
exceptions during the execution. The usage of break statement is understood from an example
given below.
x = 100
while ( x < 600) :
print (x)
if ( x == 300) :
break
x = x + 100
There is a variable x in the example having the initial value as [Link]-loop executes till the value
of x is less than 600. On every execution, the value of x is incremented by 100. But, as there is a
break statement, planned to be executed exactly when x = 300. Hence, it stops the current execution
at that point and never continues the loop further.
Continue Statement
This statement is used to continue to the next iteration (loop) without executing further statements
in the current iteration (loop). There is no need to give any condition for these continue statement.
This statement is used whenever you need to avoid the further statements and want to execute the
next iteration. The usages of continue statement is understood from an example given below.
x = 100
while (x< 600):
print (x)
if (x == 300):
continue
x = x + 100
There is a variable x in the example having the initial value as 100. While-loop executes till the
value of x is less than 600. On every execution, the value of x is incremented by 100. But, as there is
a continue statement, planned to be executed exactly when x = 300. Hence, it stops the current
execution at that point and continues to the next iteration(loop). Means, once the continue
statement is executed, further statementsi.e., increment will not be executed and the value of x
remains as 300. So, the loop will further become infinite loop as the condition will not become False
at any case, as there is no increment of value of x.
2.8 Functions
Function is a block of statements, whichis executed only when it is called. Function is given a
specific name. We can use that name whenever we want to call that function. Function can be
divided into twotypes. They are system-defined function and user-defined function. Let us focus
more on the user-defined function. Declaration of function and calling of functions are discussed
here. The following figure represents the usage of passing a value (x) to the function (f) and getting
the output. Function is called by sending the value of x. Function is using the value of x and
performing the computation. The result is sent back as the final output.
We know that a function that you define yourself in a program is known as user defined function.
The function definition and declaration is well understood from the example given below. We are
defining a function using the keyword “def”. The name of the function is “fahr_to_celsius”. This
function accepts only one parameter i.e., temp. The computed value is returned from the function
using the keyword “return” as given in the figure. This function does the converting the
temperature into Fahrenheit.
Here is one more example, which accepts two parameters a and b. Addition of giventwo numbers
is performed in the function. The function name is my_fun. We are not returning anything from the
[Link] is noreturn statementexist in the function.