0% found this document useful (0 votes)
2K views108 pages

Grokking Machine Learning MEAP v07 Luis G Serrano PDF Download

The document discusses the MEAP edition of 'Grokking Machine Learning' by Luis G. Serrano, which aims to simplify the understanding of machine learning for readers without a heavy math or programming background. It presents machine learning as a process similar to human decision-making, emphasizing intuition and experience over complex formulas. The book includes practical exercises, Python coding examples, and covers various machine learning algorithms and their applications.

Uploaded by

filwcjs223
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views108 pages

Grokking Machine Learning MEAP v07 Luis G Serrano PDF Download

The document discusses the MEAP edition of 'Grokking Machine Learning' by Luis G. Serrano, which aims to simplify the understanding of machine learning for readers without a heavy math or programming background. It presents machine learning as a process similar to human decision-making, emphasizing intuition and experience over complex formulas. The book includes practical exercises, Python coding examples, and covers various machine learning algorithms and their applications.

Uploaded by

filwcjs223
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

grokking Machine Learning MEAP v07 Luis G

Serrano pdf download

https://textbookfull.com/product/grokking-machine-learning-meap-v07-luis-g-serrano/

★★★★★ 4.8/5.0 (36 reviews) ✓ 109 downloads ■ TOP RATED


"Fantastic PDF quality, very satisfied with download!" - Emma W.

DOWNLOAD EBOOK
grokking Machine Learning MEAP v07 Luis G Serrano pdf
download

TEXTBOOK EBOOK TEXTBOOK FULL

Available Formats

■ PDF eBook Study Guide TextBook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


Collection Highlights

Deep Learning for Natural Language Processing (MEAP V07)


Stephan Raaijmakers

API Design Patterns MEAP v07 Jj Geewax

Data Analysis with Python and PySpark (MEAP V07) Jonathan


Rioux

Machine Learning Engineering in Action MEAP V04 Ben T


Wilson
Interpretable AI Building explainable machine learning
systems MEAP V02 Ajay Thampi

Grokking Deep Reinforcement Learning First Edition Miguel


Morales

Math for Programmers 3D graphics machine learning and


simulations with Python MEAP V10 1st Edition Paul Orland

Deep Learning with Python MEAP 2nd Edition Francois


Chollet

Python Real World Machine Learning: Real World Machine


Learning: Take your Python Machine learning skills to the
next level 1st Edition Joshi
MEAP Edition
Manning Early Access Program
Grokking Machine Learning
Version 7

Copyright 2020 Manning Publications

For more information on this and other Manning titles go to


manning.com

©Manning Publications Co. To comment go to liveBook


welcome
Thank you for purchasing the MEAP edition of Grokking Machine Learning.
Machine learning is, without a doubt, one of the hottest topics in the world right now. Most
companies are using it, or planning to use it, for many applications. Some people dub machine
learning as the new electricity, or the new industrial revolution. I would go a bit farther and
call it the new renaissance. Why? Because in the Renaissance, progress was made in the arts,
the sciences, engineering, mathematics, and almost all the fields by the same people. With
machine learning, this is finally possible again. With a strong knowledge of machine learning,
one is able to derive cutting edge results in almost any field one decides to apply them, and
this is fascinating. And that is what this book is for, to get you up to speed with the fast-
moving world of machine learning!
But what is machine learning? I define it as “common sense, but for a computer.” What
does this mean? It means that machine learning is the practice of getting computers
to make decisions using the decision-making process that we, humans, utilize in our daily life.
Humans make many decisions based on past experiences, and we can teach this decision-
making process to the computer, with the difference that computers call their past
experiences “data.
”Most approaches to machine learning require a heavy amount of mathematics, in
particular, linear algebra, calculus, and probability. While a solid understanding of
these topics is very useful for learning machine learning, I strongly believe that they
are not absolutely necessary. What is needed to understand machine learning is a
visual mind, an intuition of basic probability, and a strong desire to learn.
In this book, I present machine learning as a series of exercises of increasing difficulty, in
which the final goal is to teach a computer how to take a particular decision. Each chapter is
dedicated to a different machine learning algorithm, and is focused in one use-case of this
algorithm, such as spam detection, language analysis, image recognition, and so on. For the
readers who are interested in programming, I also code the algorithms in Python, and teach
some useful packages that are used in industry and research. The code is also shared
in Github for easy download.
I really hope that you enjoy this book, and that it is a first step on your journey toward
becoming a machine learning expert!
I encourage you to share questions, comments, or suggestions about this book in
liveBook's Discussion Forum for the book.

—Luis Serrano, PhD

©Manning Publications Co. To comment go to liveBook


brief contents
1 What is machine learning?
2 Types of machine learning
3 Drawing a line close to our points: linear regression
4 Using lines to split our points: the perceptron algorithm
5 A continuous approach to splitting points: logistic regression
6 Using probability to its maximum: naive Bayes algorithm
7 Splitting data by asking questions: decision trees
8 Combining building blocks to gain more power: neural networks
9 Finding boundaries with style: Support vector machines and the kernel method
10 Combining models to maximize results: Ensemble learning
APPENDIX
The math behind the algorithms

©Manning Publications Co. To comment go to liveBook


1

1
What is machine learning?

It is common sense, except done by a computer


This chapter covers:

• What is machine learning?


• Is machine learning hard? (Spoiler: No)
• Why you should read this book?
• What will we learn in this book?
• How do humans think, how do machines think, and what does this have to do with machine
learning?

I am super happy to join you in your learning journey!

Welcome to this book! I’m super happy to be joining you in this journey through
understanding machine learning. At a high level, machine learning is a process in which the
computer solves problems and makes decisions in a similar way that humans do.
In this book, I want to bring one message to you, and it is: Machine learning is easy! You
do not need to have a heavy math knowledge or a heavy programming background to
understand it. What you need is common sense, a good visual intuition, and a desire to learn
and to apply these methods to anything that you are passionate about and where you want to
make an improvement in the world. I’ve had an absolute blast writing this book, as I love
understanding these topics more and more, and I hope you have a blast reading it and diving
deep into machine learning!

Machine learning is everywhere, and you can do it.

Machine learning is everywhere. This statement seems to be more true every day. I have a
hard time imagining a single aspect of life that cannot be improved in some way or another by

©Manning Publications Co. To comment go to liveBook


2

machine learning. Anywhere there is a job that requires repetition, that requires looking at
data and gathering conclusions, machine learning can help. Especially in the last few years,
where computing power has grown so fast, and where data is gathered and processed pretty
much anywhere. Just to name a few applications of machine learning: recommendation
systems, image recognition, text processing, self-driving cars, spam recognition, anything.
Maybe you have a goal or an area in which you are making, or want to make an impact on.
Very likely, machine learning can be applied to this field, and hopefully that brought you to
this book. So, let’s find out together!

1.1 Why this book?


We play the music of machine learning; the formulas and code come later.

Most of the times, when I read a machine learning book or attend a machine learning lecture,
I see either a sea of complicated formulas, or a sea of lines of code. For a long time, I thought
this was machine learning, and it was only reserved for those who had a very solid knowledge
of both.
I try to compare machine learning with other subjects, such as music. Musical theory and
practice are complicated subjects. But when we think of music, we do not think of scores and
scales, we think of songs and melodies. And then I wondered, is machine learning the same?
Is it really just a bunch of formulas and code, or is there a melody behind that?
With this in mind, I embarked in a journey for understanding the melody of machine
learning. I stared at formulas and code for months, drew many diagrams, scribbled drawings
on napkins with my family, friends, and colleagues, trained models on small and large
datasets, experimented, until finally some very pretty mental pictures started appearing. But
it doesn’t have to be that hard for you. You can learn more easily without having to deal with
the math from the start. Especially since the increasing sophistication of ML tools removes
much of the math burden. My goal with this book is to make machine learning fully
understandable to every human, and this book is a step on that journey, that I’m very happy
you’re taking with me!

©Manning Publications Co. To comment go to liveBook


3

Figure 1.1. Music is not only about scales and notes. There is a melody behind all the technicalities.

In the same way, machine learning is not about formulas and code.

There is also a melody, and in this book we sing it.

1.2 Is machine learning hard?


No.

Machine learning requires imagination, creativity, and a visual mind. This is all. It helps a lot if
we know mathematics, but the formulas are not required. It helps if we know how to code, but
nowadays, there are many packages and tools that help us use machine learning with minimal
coding. Each day, machine learning is more available to everyone in the world. All you need is
an idea of how to apply it to something, and some knowledge about how to handle data. The
goal of this book is to give you this knowledge.

1.3 But what exactly is machine learning?


Once upon a time, if we wanted to make a computer perform a task, we had to write a
program, namely, a whole set of instructions for the computer to follow. This is good for
simple tasks, but how do we get a computer to, for example, identify what is on an image? For
example, is there a car on it, is there a person on it. For these kind of tasks, all we can do is
give the computer lots of images, and make it learn attributes about them, that will help it
recognize them. This is machine learning, it is teaching computers how to to something by
experience, rather than by instructions. It is the equivalent of when, as humans, we take
decisions based on our intuition, which is based on previous experience. In a way, machine

©Manning Publications Co. To comment go to liveBook


4

learning is about teaching the computer how to think like a human. Here is how I define
machine learning in the most concise way:
Machine learning is common sense, except done by a computer.

Figure 1.2. Machine learning is about computers making decisions based on experience.

In the same way that humans make decisions based on previous experiences, computers can make decisions
based on previous data. The rules computers use to make decisions are called models.

Not a huge fan of formulas? You are in the right place

In most machine learning books, each algorithm is explained in a very formulaic way, normally
with an error function, another formula for the derivative of the error function, and a process
that will help us minimize this error function in order to get to the solution. These are the
descriptions of the methods that work well in the practice, but explaining them with formulas
is the equivalent of teaching someone how to drive by opening the hood and frantically
pointing at different parts of the car, while reading their descriptions out of a manual. This
doesn’t show what really happens, which is, the car moves forward when we press the gas
pedal, and stops when we hit the breaks. In this book, we study the algorithms in a different
way. We do not use error functions and derivatives. Instead, we look at what is really
happening with our data, and how is it that we are modeling it.
Don’t get me wrong, I think formulas are wonderful, and when needed, we won’t shy away
from them. But I don’t think they form the big picture of machine learning, and thus, we go
over the algorithms in a very conceptual way that will show us what really is happening in
machine learning.

©Manning Publications Co. To comment go to liveBook


5

1.3.1 What is the difference between artificial intelligence and machine learning?
First things first, machine learning is a part of artificial intelligence. So anytime we are doing
machine learning, we are also doing artificial intelligence.

Figure 1.3. Machine learning is a part of artificial intelligence.

I think of artificial intelligence in the following way:


Artificial intelligence encompasses all the ways in which a computer can make decisions.
When I think of how to teach the computer to make decisions, I think of how we as human
make decisions. There are mainly two ways we use to make most decisions:

1. By using reasoning and logic.


2. By using our experience.

Both of these are mirrored by computers, and they have a name: Artificial intelligence.
Artificial intelligence is the name given to the process in which the computer makes decisions,
mimicking a human. So in short, points 1 and 2 form artificial intelligence.
Machine learning, as we stated before, is when we only focus on point 2. Namely, when
the computer makes decisions based on experience. And experience has a fancy term in
computer lingo: data. Thus, machine learning is when the computer makes decisions, based
on previous data. In this book, we focus on point 2, and study many ways in which machine
can learn from data.
A small example would be how Google maps finds a path between point A and point B.
There are several approaches, for example the following:

©Manning Publications Co. To comment go to liveBook


6

1. Looking into all the possible roads, measuring the distances, adding them up in all
possible ways, and finding which combination of roads gives us the shortest path
between points A and B.
2. Watching many cars go through the road for days and days, recording which cars get
there in less time, and finding patterns on what their routes where.

As you can see, approach 1 uses logic and reasoning, whereas approach 2 uses previous data.
Therefore, approach 2 is machine learning. Approaches 1 and 2 are both artificial intelligence.

1.3.2 What about deep learning?


Deep learning is arguably the most commonly used type of machine learning. The reason is
simply that it works really well. If you are looking at any of the cutting edge applications, such
as image recognition, language generation, playing Go, or self driving cars, very likely you are
looking at deep learning in some way or another. But what exactly is deep learning? This term
applies to every type of machine learning that uses Neural Networks. Neural networks are one
type of algorithm, which we learn in Chapter 5.
So in other words, deep learning is simply a part of machine learning, which in turn is a
part of artificial intelligence. If this book was about vehicles, then AI would be motion, ML
would be cars, and deep learning (DL) would be Ferraris.

Figure 1.4. Deep learning is a part of machine learning.

1.4 Humans use the remember-formulate-predict framework to


make decisions (and so can machines!)
How does the computer make decisions based on previous data? For this, let’s first see the
process of how humans make decisions based on experience. And this is what I call the

©Manning Publications Co. To comment go to liveBook


7

remember-formulate-predict framework. The goal of machine learning is to teach computers


how to think in the same way, following the same framework.

1.4.1 How do humans think?


When we humans need to make a decision based on our experience, we normally use the
following framework:

1. We remember past situations that were similar.


2. We formulate a general rule.
3. We use this rule to predict what will happen if we take a certain decision.

For example, if the question is: “Will it rain today?”, the process to make a guess will be the
following:

1. We remember that last week it rained most of the time.


2. We formulate that in this place, it rains most of the time.
3. We predict that today it will rain.

We may be right or wrong, but at least, we are trying to make an accurate prediction.

Figure 1.2. The remember-formulate-predict framework.

Let us put this in practice with an example.

Example 1: An annoying email friend

Here is an example. We have a friend called Bob, who likes to send us a lot of email. In
particular, a lot of his emails are spam, in the form of chain letters, and we are starting to get

©Manning Publications Co. To comment go to liveBook


8

a bit annoyed at him. It is Saturday, and we just got a notification of an email from him. Can
we guess if it is spam or not without looking at the email?

SPAM AND HAM Spam is the common term used for junk or unwanted email, such as chain letters,
promotions, and so on. The term comes from a 1972 Monty Python sketch in which every item in the menu of
a restaurant contained spam as an ingredient. Among software developers, the term ‘ham’ is used to refer to
non-spam emails. I use this terminology in this book.

For this, we use the remember-formulate-predict method.


First let us remember, say, the last 10 emails that we got from Bob. We remember that 4
of them were spam, and the other 6 were ham. From this information, we can formulate the
following rule:
Rule 1: 4 out of every 10 emails that Bob sends us are spam.
This rule will be our model. Note, this rule does not need to be true. It could be
outrageously wrong. But given our data, it is the best that we can come up to, so we’ll live
with it. Later in this book, we learn how to evaluate models and improve them when needed.
But for now, we can live with this.
Now that we have our rule, we can use it to predict if the email is spam or not. If 40 out
of 10 of the emails that Bob sends us are spam, then we can assume that this new email is
40% likely to be spam, and 60% likely to be ham. Therefore, it’s a little safer to think that the
email is ham. Therefore, we predict that the email is not spam.
Again, our prediction may be wrong. We may open the email and realize it is spam. But we
have made the prediction to the best of our knowledge. This is what machine learning is all
about.
But you may be thinking, 6 out of 10 is not enough confidence on the email being spam or
ham, can we do better? Let’s try to analyze the emails a little more. Let’s see when Bob sent
the emails to see if we find a pattern.

©Manning Publications Co. To comment go to liveBook


9

Figure 1.3. A very simple machine learning model.

Example 2: A seasonal annoying email friend

Let us look more carefully at the emails that Bob sent us in the previous month. Let’s look at
what day he sent them. Here are the emails with dates, and information about being spam or
ham:

• Monday: Ham
• Tuesday: Ham
• Saturday: Spam
• Sunday: Spam
• Sunday: Spam
• Wednesday: Ham
• Friday: Ham
• Saturday: Spam
• Tuesday: Ham
• Thursday: Ham

Now things are different. Can you see a pattern? It seems that every email Bob sent during
the week, is ham, and every email he sent during the weekend is spam. This makes sense,
maybe during the week he sends us work email, whereas during the weekend, he has time to
send spam, and decides to roam free. So, we can formulate a more educated rule:

©Manning Publications Co. To comment go to liveBook


10

Rule 2: Every email that Bob sends during the week is ham, and during the weekend is
spam.
And now, let’s look at what day is it today. If it is Saturday, and we just got an email from
him, then we can predict with great confidence that the email he sent is spam. So we make
this prediction, and without looking, we send the email to the trash can.
Let’s give things names, in this case, our prediction was based on a feature. The feature
was the day of the week, or more specifically, it being a weekday or a day in the weekend.
You can imagine that there are many more features that could indicate if an email is spam or
ham. Can you think of some more? In the next paragraphs we’ll see a few more features.

Figure 1.4. A slightly more complex machine learning model, done by a human.

Example 3: Things are getting complicated!

Now, let’s say we continue with this rule, and one day we see Bob in the street, and he says
“Why didn’t you come to my birthday party?” We have no idea what he is talking about. It
turns out last Sunday he sent us an invitation to his birthday party, and we missed it! Why did
we miss it, because he sent it on the weekend. It seems that we need a better model. So let’s
go back to look at Bob’s emails, in the following table, this is our remember step. Now let’s
see if you can help me find a pattern.

• 1KB: Ham
• 12KB: Ham
• 16KB: Spam

©Manning Publications Co. To comment go to liveBook


11

• 20KB: Spam
• 18KB: Spam
• 3KB: Ham
• 5KB: Ham
• 25KB: Spam
• 1KB: Ham
• 3KB: Ham

What do we see? It seems that the large emails tend to be spam, while the smaller ones tend
to not be spam. This makes sense, since maybe the spam ones have a large attachment.
So, we can formulate the following rule:
Rule 3: Any email larger of size 10KB or more is spam, and any email of size less than
10KB is ham.
So now that we have our rule, we can make a prediction. We look at the email we
received today, and the size is 19KB. So we conclude that it is spam.

Figure 1.5. Another slightly more complex machine learning model, done by a human.

Is this the end of the story? I don’t know…

Example 4: More?

Our two classifiers were good, since they rule out large emails and emails sent on the
weekends. Each one of them uses exactly one of these two features. But what if we wanted a
rule that worked with both features? Rules like the following may work:

©Manning Publications Co. To comment go to liveBook


12

Rule 4: If an email is larger than 10KB or it is sent on the weekend, then it is classified as
spam. Otherwise, it is classified as ham.
Rule 5: If the email is sent during the week, then it must be larger than 15KB to be
classified as spam. If it is sent during the weekend, then it must be larger than 5KB to be
classified as spam. Otherwise, it is classified as ham.
Or we can even get much more complicated.
Rule 6: Consider the number of the day, where Monday is 0, Tuesday is 1, Wednesday is
2, Thursday is 3, Friday is 4, Saturday is 5, and Sunday is 6. If we add the number of the day
and the size of the email (in KB), and the result is 12 or more, then the email is classified as
spam. Otherwise, it is classified as ham.

Figure 1.6. An even more complex machine learning model, done by a human.

All of these are valid rules. And we can keep adding layers and layers of complexity. Now the
question is, which is the best rule? This is where we may start needing the help of a computer.

1.4.2 How do machines think?


The goal is to make the computer think the way we think, namely, use the remember-
formulate-predict framework. In a nutshell, here is what the computer does in each of the
steps.
Remember: Look at a huge table of data.
Formulate: Go through many rules and formulas, and check which one fits the data best.
Predict: Use the rule to make predictions about future data.
This is not much different than what we did in the previous section. The great
advancement here is that the computer can try building rules such as rules 4, 5, or 6, trying

©Manning Publications Co. To comment go to liveBook


13

different numbers, different boundaries, and so on, until finding one that works best for the
data. It can also do it if we have lots of columns. For example, we can make a spam classifier
with features such as the sender, the date and time of day, the number of words, the number
of spelling mistakes, the appearances of certain words such as “buy”, or similar words. A rule
could easily look as follows:
Rule 7:

• If the email has two or more spelling mistakes, then it is classified as spam.

o Otherwise, if it has an attachment larger than 20KB, it is classified as spam.


 Otherwise, if the sender is not in our contact list, it is classified as spam.
• Otherwise, if it has the words “buy” and “win”, it is classified as spam.
o Otherwise, it is classified as ham.

Or even more mathematical, such as:


Rule 8: If
(size) + 10 x (number of spelling mistakes) - (number of appearances of the word ‘mom’)
+ 4 x (number of appearances of the word ‘buy’) > 10,
then we classify the message as spam. Otherwise we do not.

Figure 1.7. A much more complex machine learning model, done by a computer.

©Manning Publications Co. To comment go to liveBook


14

Now the question is, which is the best rule? The quick answer is: The one that fits the data
best. Although the real answer is: The one that generalizes best to new data. At the end of the
day, we may end up with a very complicated rule, but the computer can formulate it and use
it to make predictions very quickly. And now the question is: How to build the best model?
That is exactly what this book is about.

1.5 What is this book about?


Good question. The rules 1-8 above, are examples of machine learning models, or classifiers.
As you saw, these are of different types. Some use an equation on the features to make a
prediction. Others use a combination of if statements. Others will return the answer as a
probability. Others may even return the answer as a number! In this book, we study the main
algorithms of what we call predictive machine learning. Each one has its own style, way to
interpret the features, and way to make a prediction. In this book, each chapter is dedicated
to one different type of model.
This book provides you with a solid framework of predictive machine learning. To get the
most out of this book, you should have a visual mind, and a basis of mathematics, such as
graphs of lines, equations, and probability. It is very helpful (although not mandatory) if you
know how to code, specially in Python, as you will be given the opportunity to implement and
apply several models in real datasets throughout the book. After reading this book, you will be
able to do the following:

• Describe the most important algorithms in predictive machine learning and how they
work, including linear and logistic regression, decision trees, naive Bayes, support
vector machines, and neural networks.
• Identify what are their strengths and weaknesses, and what parameters they use.
• Identify how these algorithms are used in the real world, and formulate potential ways
to apply machine learning to any particular problem you would like to solve.
• How to optimize these algorithms, compare them, and improve them, in order to build
the best machine learning models we can.

If you have a particular dataset or problem in mind, we invite you to think about how to apply
each of the algorithms to your particular dataset or problem, and to use this book as a starting
point to implement and experiment with your own models.
I am super excited to start this journey with you, and I hope you are as excited!

1.6 Summary
• Machine learning is easy! Anyone can do it, regardless of their background, all that is
needed is a desire to learn, and great ideas to implement!
• Machine learning is tremendously useful, and it is used in most disciplines. From
science to technology to social problems and medicine, machine learning is making an
impact, and will continue making it.
• Machine learning is common sense, done by a computer. It mimics the ways humans

©Manning Publications Co. To comment go to liveBook


15

think in order to make decisions fast and accurately.


• Just like humans make decisions based on experience, computers can make decisions
based on previous data. This is what machine learning is all about.
• Machine learning uses the remember-formulate-predict framework, as follows:

o Remember: Use previous data.


o Formulate: Build a model, or a rule, for this data.
o Predict: Use the model to make predictions about future data.

©Manning Publications Co. To comment go to liveBook


16

2
Types of machine learning

This chapter covers:

• Three main different types of machine learning.


• The difference between labelled and unlabelled data.
• What supervised learning is and what it’s useful for.
• The difference between regression and classification, and what are they useful for.
• What unsupervised learning is and what it’s useful for.
• What reinforcement learning is and what it’s useful for.

As we learned in Chapter 1, machine learning is common sense, but for a computer. It mimics
the process in which humans make decisions based on experience, by making decisions based
on previous data. Of course, this is challenging for computers, as all they do is store numbers
and do operations on them, so programming them to mimic human level of thought is difficult.
Machine learning is divided into several branches, and they all mimic different types of ways in
which humans make decisions. In this chapter, we overview some of the most important of
these branches.
ML has applications in many many fields. Can you think of some fields in which you can
apply machine learning? Here is a list of some of my favorites:

• Predicting housing prices based on their size, number of rooms, location, etc.
• Predicting the stock market based on other factors of the market, and yesterday’s
price.
• Detecting spam or non-spam e-mails based on the words of the e-mail, the sender, etc.
• Recognizing images as faces, animals, etc., based on the pixels in the image.
• Processing long text documents and outputting a summary.
• Recommending videos or movies to a user (for example YouTube, Netflix, etc.).
• Chatbots that interact with humans and answer questions.

©Manning Publications Co. To comment go to liveBook


17

• Self driving cars that are able to navigate a city.


• Diagnosing patients as sick or healthy.
• Segmenting the market into similar groups based on location, acquisitive power,
interests, etc.
• Playing games like chess or Go.

Try to imagine how we could use machine learning in each of these fields. Some applications
look similar. For example, we can imagine that predicting housing prices and predicting stock
prices must use similar techniques. Likewise, predicting if email is spam and predicting if
credit card transactions are legitimate or fraudulent may also use similar techniques. What
about grouping users of an app based on similarity? That sounds very different than predicting
housing prices, but could it be that it is done in a similar way as we group newspaper articles
by topic? And what about playing chess? That sounds very different than predicting if an email
is spam. But it sounds similar to playing Go.
Machine learning models are grouped into different types, according to the way they
operate. The main three families of machine learning models are

• supervised learning,
• unsupervised learning, and
• reinforcement learning.

In this chapter, we overview them all. However, in this book, we only cover supervised
learning, as it is the most natural one to start learning, and arguably the most commonly
used. We encourage you to look up the other types in the literature and learn about them too,
as they are all very interesting and useful!
(Sidebar) Recommended sources: (not sure how to write this)

1. Grokking Deep Reinforcement Learning, by Miguel Morales (Manning)


2. UCL course on reinforcement learning, by David Silver
(http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html)
3. Deep Reinforcement Learning Nanodegree Program, by Udacity.
(https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893)

2.1 What is the difference between labelled and unlabelled data?


Actually, what is data?

Let’s first establish a clear definition of what we mean by data. Data is simply information. Any
time we have a table with information, we have data. Normally, each row is a data point. Let’s
say, for example, that we have a dataset of pets. In this case, each row represents a different
pet. Each pet is described then, by certain features.

Ok. And what are features?

©Manning Publications Co. To comment go to liveBook


18

Features are simply the columns of the table. In our pet example, the features may be size,
name, type, weight, etc. This is what describes our data. Some features are special, though,
and we call them labels.

Labels?

This one is a bit less obvious, and it depends on the context of the problem we are trying to
solve. Normally, if we are trying to predict a feature based on the others, that feature is the
label. If we are trying to predict the type of pet we have (for example cat or dog), based on
information on that pet, then that is the label. If we are trying to predict if the pet is sick or
healthy based on symptoms and other information, then that is the label. If we are trying to
predict the age age of the pet, then the age is the label.
So now we can define two very important things, labeled and unlabeled data.
Labeled data: Data that comes with a label.
Unlabeled data: Data that comes without a label.

Figure 2.1. Labeled data is data that comes with a tag, like a name, a type, or a number. Unlabeled data is data
that comes with no tag.

So what is then, supervised and unsupervised learning?

Clearly, it is better to have labeled data than unlabeled data. With a labeled dataset, we can
do much more. But there are still many things that we can do with an unlabeled dataset.
The set of algorithms in which we use a labeled dataset is called supervised learning. The
set of algorithms in which we use an unlabeled dataset, is called unsupervised learning. This is
what we learn next.

©Manning Publications Co. To comment go to liveBook


19

2.2 What is supervised learning?


Supervised learning is the type of machine learning you find in the most common applications
nowadays, including image recognition, various forms of text processing, recommendation
systems, and many more. As we stated in the previous section, it is a type of predictive
machine learning in which the data comes with labels, where the label is the target we are
interested in predicting.
In the example on Figure 2.1, where the dataset is formed by images of dogs and cats,
and the labels in the image are ‘dog’ and ‘cat’, the machine learning model would simply use
previous data in order to predict the label of new data points. This means, if we bring in a new
image without a label, the model would guess if the image is of a dog or a cat, thus predicting
the label of the data point.

Figure 2.2. A supervised learning model predicts the label of a new data point.

If you recall Chapter 1, the framework we learned for making a decision was Remember-
Formulate-Predict. This is precisely how supervised learning works. The model first
remembers the dataset of dogs and cats, then formulates a model, or a rule for what is a
dog and what is a cat, and when a new image comes in, the model makes a prediction about
what the label of the image is, namely, is it a dog or a cat.

©Manning Publications Co. To comment go to liveBook


20

Figure 2.3. Supervised learning follows the Remember-Formulate-Predict framework from Chapter 1.

Now, notice that in Figure 2.1, we have two types of datasets, one in which the labels are
numbers (the weight of the animal), and one in which the labels are states, or classes (the
type of animal, namely cat or dog). This gives rise to two types of supervised learning models.
Regression models: These are the types of models that predict a number, such as the
weight of the animal.
Classification models: These are the types of models that predict a state, such as the
type of animal (cat or dog).
We call the output of a regression model continuous, since the prediction can be any real
value, picked from a continuous interval. We call the output of a classification model discrete,
since the prediction can be a value from a finite list. An interesting fact is that the output can
be more than two states. If we had more states, say, a model that predicts if a picture is of a
dog, a cat, or a bird, we can still use a discrete model. These models are called multivariate
discrete models. There are classifiers with many states, but it must always be a finite number.
Let’s look at two examples of supervised learning models, one regression and one
classification:

©Manning Publications Co. To comment go to liveBook


21

Example 1 (regression), housing prices model: In this model, each data point is a house.
The label of each house is its price. Our goal is, when a new house (data point) comes in the
market, we would like to predict its label, namely, its price.
Example 2 (classification), email spam detection model: In this model, each data point is
an email. The label of each email is either spam or ham. Our goal is, when a new email (data
point) comes into our inbox, we would like to predict its label, namely, if it is spam or ham.
You can see the difference between models 1 and 2.

• Example 1, the housing prices model, is a model that can return many numbers, such
as $100, $250,000, or $3,125,672. Thus it is a regression model.
• Example 2, the spam detection model, on the other hand, can only return two things:
spam or ham. Thus it is a classification model.

Let’s elaborate some more on regression and classification.

2.2.1 Regression models predict numbers


As we mentioned previously, regression models are those that predict a number. This number
is predicted from the features. In the housing example, the features can be the size of the
house, the number of rooms, the distance to the closest school, the crime rate in the
neighborhood, etc.
Other places where one can use regression models are the following:

• Stock market: Predicting the price of a certain stock based on other stock prices, and
other market signals.
• Medicine: Predicting the expected lifespan of a patient, or the expected recovery time,
based on symptoms and the medical history of the patient.
• Sales: Predicting the expected amount of money a customer will spend, based on the
client’s demographics and past purchase behavior.
• Video recommendations: Predicting the expected amount of time a user will watch a
video, based on the user’s demographics and past interaction with the site.

The most common method used for regression is linear regression, which is when we use
linear functions (basically lines) to make our predictions based on the features. We study
linear regression in Chapter 3.

2.2.2 Classification models predict a state


Classification models are those that predict a state, from a finite set of states. The most
common ones predict a ‘yes’ or a ‘no’, but there are many models which use a larger set of
states. The example we saw in Figure 2.3 is of classification, as it predicts the type of the pet,
namely, ‘cat’ or ‘dog’.
In the email spam recognition example, the state of the email (namely, is it spam or not)
is predicted from the features. In this case, the features of the email are the words on it, the
number of spelling mistakes, the sender, and many others.

©Manning Publications Co. To comment go to liveBook


22

Another very common example of classification is image recognition. The most popular
image recognition models take as an input the pixels in the image, and output a prediction of
what the image most likely depicts. Two of the most famous datasets for image recognition
are MNIST and CIFAR-10. MNIST is formed by around 70,000 images of handwritten digits,
which are classified as the digits 0-9. These images come from a combination of sources,
including the American Census Bureau, and handwritten digits taken from American high
school students. It can be found in the following link: http://yann.lecun.com/exdb/mnist/.
CIFAR-10 is made of 60,000 32 by 32 colored images of different things. These are classified
as 10 different classes (thus the 10 in the name), namely airplanes, cars, birds, cats, deer,
dogs, frogs, horses, ships, and trucks. This database is maintained by the Canadian Institute
For Advanced Research (CIFAR), and can be found in the following link:
https://www.cs.toronto.edu/~kriz/cifar.html.
Other places where one can use classification models are the following:

• Sentiment analysis: Predicting if a movie review is positive or negative, based on the


words in the review.
• Website traffic: Predicting if a user will click on a link or not, based on the user’s
demographics and past interaction with the site.
• Social media: Predicting if a user will befriend or interact with another user or not,
based on their demographics, history, and friends in common.

The bulk of this book talks about classification models. In chapters 3-x, we talk about
classification models in the context of logistic regression, decision trees, naive Bayes, support
vector machines, and the most popular classification models nowadays: neural networks.

2.3 What is unsupervised learning?


Unsupervised learning is also a very common type of machine learning. It differs from
supervised learning in that the data has no labels. What is a dataset with no labels, you ask?
Well, it is a dataset with only features, and no target to predict. For example, if our housing
dataset had no prices, then it would be an unlabeled dataset. If our emails dataset had no
labels, then it would simply be a dataset of emails, where ‘spam’ and ‘no spam’ is not
specified.
So what could you do with such a dataset? Well, a little less than with a labelled dataset,
unfortunately, since the main thing we are aiming to predict is not there. However, we can still
extract a lot of information from an unlabelled dataset. Here is an example, let us go back to
the cats and dogs example in Figure 2.1. If our dataset has no labels, then we simply have a
bunch of pictures of dogs and cats, and we do not know what type of pet each one represents.
Our model can still tell us if two pictures of dogs are similar to each other, and different to a
picture of a cat. Maybe it can group them in some way by similarity, even without knowing
what each group represents.

©Manning Publications Co. To comment go to liveBook


resemble is fires

descending Dame

worded

S but orientation

concerned back

of

great the considerable

much the

as terra
remaining

to rerum

Quixote as

thing large

their subscribers

of and In

arch hood mountainous

seclusion and

Government

the
cannot only it

ever

Catholic as Venerabilis

coerced raison Nihilism

kingdom

K the of

regarding convincing scene

confusion bearers
rule Russell find

devoted of good

and

have are

and carnivorous

illuminate if

fairly and by

than Empire 2

may by the

was narrow
millions weapons 100

been is die

the ingenious

the

the these

they
naturally temple

ledge between peeping

they sentiat letter

know College smiled

Riviera infinitely suffers

notes general

there
These

chnrcb natural

inch

so

to each was

over

head contained lies

1 beatitude as

the
privilege as nature

the

no yet The

had give making

of follow

as A

social
in

explained long

within

his the

heads man was

our It

afterwards would well

seconded manner feet


to function received

headstrong and of

reason the things

It

whether was of
North special been

of esq

description incredulity

audience Present

Jehoshaphat and

Two it

and

In
his

strengthen Chalmers

reported air

India which

to

understanding

Oscott Litt no

and and
fathoms sharper

the continetur reactionary

1885 prospects

and conceals

placed

Mosque

in the

Black
come cannot

here the necessarily

at but

of Mass he

to on

of study resolution
F

name of him

of enhanced

our Accadians underlying

that Catholics so

analysis

choice the

weakness Shee iniquity


Bacchus

him creatures

that

promontorium the curving

will higher to

as
has

of of

it

identity Sagas W

deforms

plenitude page of
of

make aristocracy

and us

son

the

p but rather

large Pugin convulsion

realized

views Catholic

turnino
the were

the

to family the

time doesn our

non

with of his

correctly

institutori Hence

the author

at
and we in

may large TZE

of weary Atlantic

the means de

star strongminded the


at

It to Why

Secretary a majesty

the of tale

numerous

a who
impaired to alter

us of

idea the illness

establishments are Father

they which without

and
for all main

free repulsive

it the

to

with world

fathers this
ourselves

one side

the The and

feeble works

on the

a and to

in

of
of on a

of

approaches

of the

city example Rosnat

it defined

City town
ConnelPs

to was The

absurd necessary

though evictions

also

Christian some matters

exempted
founded in great

Longfellovj of the

Europe a former

things political as

of the

Depretis of brethren

all to
last

most

one

was in ideas

began
beneath It

consist

the

Heaven most

the sketching

on China
from in humiliating

subsequently from explain

battle

it out the

cUrious at P
and

to we the

attributed

unfortunate reading

all told Third

The
altar a scepter

say does

dull

reality of

Mer hears

on the action

culture good our


the Pacific The

instead

not But

their

is
causes remarkable

of

the now

days

repetition an

chronic Camden worse

dictates Caucasian

country
some the from

point

last ATLANTIS this

may for

human
spirits answers all

appearances

rivers rage

described

moderately the et

than was New

called
the

more

will his

High

gate

and toleration

easy right

base

as

he
of myself

from to

own is one

public

wilt but

it

is

wrongs

gallon id

Univers
offer is

a number

interest

and of

he
before

to

particular

even dangerous up

without quicker who

er mud

important

no railway in

those commutabilia
gathered of

intelligence

administration sufficiently Sumuho

blank as

a pertinacia The

let

Servers known

friends And the

speculations walking Great

or male
owe

eum

then is which

serious of

any inches robbing

become

are be

and fleetest of

black

I
but souls

turn

provided M

several avowed e

Disturhcmces 247 and


of

persons convince

State

that

contribute

State

Thomas

class we made

period

how worm
331 Secchi has

instructions

long

plants in

On

Room 75

of so

make

shall some he

wheat usually
beyond burst of

the and

The divided there

hilly

ineffable its how

officer life

of of It

of
loading are within

out

morally in tenth

many crowd

of in Manu

its them

strengthening vjith Astonishing


are reads

elsewhere in well

his

too know with

no

Sumuho

cannot If pastimes
children and form

are substantially would

with scenery must

The aging the

man clearly
is is

party tanti carried

to

p only Periodicals

men of am

of licences though
willing activity

erudition China a

Shearman are to

has of

or

Pauli gives
and which

more vast

those they rampart

miles good

wish

of form

an colonies fomented

its

s
or

to scenes before

expressed

the history

he

is
devotion friend vengeance

grander article ld

cap our

the him conclusion

we

of Co

and solid there

of patch making
Egyptians

much master

his the

constantly IV precise

play you Should


Ap favour add

but the and

and infinite Sandwich

gone over

or

Here
the is men

Sydney George domestic

the had

no ranges Spellius

be all hyenas

of

of
1877

owing very

Moran

As

we witchcraft

to

no ventures

many

when
sole Tke cheerless

should

et do

loss Boormeister

Puritan

of becomes

still

account
over by

them will imagery

is

The

of

heavily

book into preferred

each as
for an stability

theory a if

charming

went it

the at writes

not the

and
more Lord Saint

form acres signs

colleges by

abundance Hanlon

principles publice Horse

IV

the physical 1880

the

British would had


chased permit is

English

while

David

as Scripture

on the municipalities

month

everywhere there when


of to

which

The miles

different of

these half

a and

down

the York

The which

not
pertinent that

to to my

this

I of and

the has 15

English and now

begun

Alison
It de

Church

the

record

most
The

distinctions

one

bloom of from

of decrepit as
resuscitated current is

me gave

saints

mention those

perfectum a There
his

serious of might

bed St

to

in

upon bettering s

worthy Pittsburg of

earth etiam theory


from and in

vel a back

from our

from orbits

clearly would and

the

to or

a intervals
waters very

artist would the

perhaps governing

prize but transit

up

be expressions Kotices
Lucas

out than

by of better

to entire of

people for

and Nostris
impatience the

translated Sacred

word

is

1787 Foirtchern Basque

Archbishop part

three

depositary Petrie

to eager

is thank and
the American and

serious plunge

the to heavens

its quomodo the

and

vastly Motais with

is

aggravated shown
else broke administration

man clearly

1886

not

control establish

were observes

enemies

Gentleman

for Ireland
Sabela the Ningpo

ours l

judges it but

in

of daughter true
of

six

decrepid

for months

Travel 454 from

qua in that

plains other

spectators of second

merely

You might also like