Grokking Machine Learning MEAP v07 Luis G Serrano PDF Download
Grokking Machine Learning MEAP v07 Luis G Serrano PDF Download
https://textbookfull.com/product/grokking-machine-learning-meap-v07-luis-g-serrano/
DOWNLOAD EBOOK
grokking Machine Learning MEAP v07 Luis G Serrano pdf
download
Available Formats
1
What is machine learning?
Welcome to this book! I’m super happy to be joining you in this journey through
understanding machine learning. At a high level, machine learning is a process in which the
computer solves problems and makes decisions in a similar way that humans do.
In this book, I want to bring one message to you, and it is: Machine learning is easy! You
do not need to have a heavy math knowledge or a heavy programming background to
understand it. What you need is common sense, a good visual intuition, and a desire to learn
and to apply these methods to anything that you are passionate about and where you want to
make an improvement in the world. I’ve had an absolute blast writing this book, as I love
understanding these topics more and more, and I hope you have a blast reading it and diving
deep into machine learning!
Machine learning is everywhere. This statement seems to be more true every day. I have a
hard time imagining a single aspect of life that cannot be improved in some way or another by
machine learning. Anywhere there is a job that requires repetition, that requires looking at
data and gathering conclusions, machine learning can help. Especially in the last few years,
where computing power has grown so fast, and where data is gathered and processed pretty
much anywhere. Just to name a few applications of machine learning: recommendation
systems, image recognition, text processing, self-driving cars, spam recognition, anything.
Maybe you have a goal or an area in which you are making, or want to make an impact on.
Very likely, machine learning can be applied to this field, and hopefully that brought you to
this book. So, let’s find out together!
Most of the times, when I read a machine learning book or attend a machine learning lecture,
I see either a sea of complicated formulas, or a sea of lines of code. For a long time, I thought
this was machine learning, and it was only reserved for those who had a very solid knowledge
of both.
I try to compare machine learning with other subjects, such as music. Musical theory and
practice are complicated subjects. But when we think of music, we do not think of scores and
scales, we think of songs and melodies. And then I wondered, is machine learning the same?
Is it really just a bunch of formulas and code, or is there a melody behind that?
With this in mind, I embarked in a journey for understanding the melody of machine
learning. I stared at formulas and code for months, drew many diagrams, scribbled drawings
on napkins with my family, friends, and colleagues, trained models on small and large
datasets, experimented, until finally some very pretty mental pictures started appearing. But
it doesn’t have to be that hard for you. You can learn more easily without having to deal with
the math from the start. Especially since the increasing sophistication of ML tools removes
much of the math burden. My goal with this book is to make machine learning fully
understandable to every human, and this book is a step on that journey, that I’m very happy
you’re taking with me!
Figure 1.1. Music is not only about scales and notes. There is a melody behind all the technicalities.
In the same way, machine learning is not about formulas and code.
Machine learning requires imagination, creativity, and a visual mind. This is all. It helps a lot if
we know mathematics, but the formulas are not required. It helps if we know how to code, but
nowadays, there are many packages and tools that help us use machine learning with minimal
coding. Each day, machine learning is more available to everyone in the world. All you need is
an idea of how to apply it to something, and some knowledge about how to handle data. The
goal of this book is to give you this knowledge.
learning is about teaching the computer how to think like a human. Here is how I define
machine learning in the most concise way:
Machine learning is common sense, except done by a computer.
Figure 1.2. Machine learning is about computers making decisions based on experience.
In the same way that humans make decisions based on previous experiences, computers can make decisions
based on previous data. The rules computers use to make decisions are called models.
In most machine learning books, each algorithm is explained in a very formulaic way, normally
with an error function, another formula for the derivative of the error function, and a process
that will help us minimize this error function in order to get to the solution. These are the
descriptions of the methods that work well in the practice, but explaining them with formulas
is the equivalent of teaching someone how to drive by opening the hood and frantically
pointing at different parts of the car, while reading their descriptions out of a manual. This
doesn’t show what really happens, which is, the car moves forward when we press the gas
pedal, and stops when we hit the breaks. In this book, we study the algorithms in a different
way. We do not use error functions and derivatives. Instead, we look at what is really
happening with our data, and how is it that we are modeling it.
Don’t get me wrong, I think formulas are wonderful, and when needed, we won’t shy away
from them. But I don’t think they form the big picture of machine learning, and thus, we go
over the algorithms in a very conceptual way that will show us what really is happening in
machine learning.
1.3.1 What is the difference between artificial intelligence and machine learning?
First things first, machine learning is a part of artificial intelligence. So anytime we are doing
machine learning, we are also doing artificial intelligence.
Both of these are mirrored by computers, and they have a name: Artificial intelligence.
Artificial intelligence is the name given to the process in which the computer makes decisions,
mimicking a human. So in short, points 1 and 2 form artificial intelligence.
Machine learning, as we stated before, is when we only focus on point 2. Namely, when
the computer makes decisions based on experience. And experience has a fancy term in
computer lingo: data. Thus, machine learning is when the computer makes decisions, based
on previous data. In this book, we focus on point 2, and study many ways in which machine
can learn from data.
A small example would be how Google maps finds a path between point A and point B.
There are several approaches, for example the following:
1. Looking into all the possible roads, measuring the distances, adding them up in all
possible ways, and finding which combination of roads gives us the shortest path
between points A and B.
2. Watching many cars go through the road for days and days, recording which cars get
there in less time, and finding patterns on what their routes where.
As you can see, approach 1 uses logic and reasoning, whereas approach 2 uses previous data.
Therefore, approach 2 is machine learning. Approaches 1 and 2 are both artificial intelligence.
For example, if the question is: “Will it rain today?”, the process to make a guess will be the
following:
We may be right or wrong, but at least, we are trying to make an accurate prediction.
Here is an example. We have a friend called Bob, who likes to send us a lot of email. In
particular, a lot of his emails are spam, in the form of chain letters, and we are starting to get
a bit annoyed at him. It is Saturday, and we just got a notification of an email from him. Can
we guess if it is spam or not without looking at the email?
SPAM AND HAM Spam is the common term used for junk or unwanted email, such as chain letters,
promotions, and so on. The term comes from a 1972 Monty Python sketch in which every item in the menu of
a restaurant contained spam as an ingredient. Among software developers, the term ‘ham’ is used to refer to
non-spam emails. I use this terminology in this book.
Let us look more carefully at the emails that Bob sent us in the previous month. Let’s look at
what day he sent them. Here are the emails with dates, and information about being spam or
ham:
• Monday: Ham
• Tuesday: Ham
• Saturday: Spam
• Sunday: Spam
• Sunday: Spam
• Wednesday: Ham
• Friday: Ham
• Saturday: Spam
• Tuesday: Ham
• Thursday: Ham
Now things are different. Can you see a pattern? It seems that every email Bob sent during
the week, is ham, and every email he sent during the weekend is spam. This makes sense,
maybe during the week he sends us work email, whereas during the weekend, he has time to
send spam, and decides to roam free. So, we can formulate a more educated rule:
Rule 2: Every email that Bob sends during the week is ham, and during the weekend is
spam.
And now, let’s look at what day is it today. If it is Saturday, and we just got an email from
him, then we can predict with great confidence that the email he sent is spam. So we make
this prediction, and without looking, we send the email to the trash can.
Let’s give things names, in this case, our prediction was based on a feature. The feature
was the day of the week, or more specifically, it being a weekday or a day in the weekend.
You can imagine that there are many more features that could indicate if an email is spam or
ham. Can you think of some more? In the next paragraphs we’ll see a few more features.
Figure 1.4. A slightly more complex machine learning model, done by a human.
Now, let’s say we continue with this rule, and one day we see Bob in the street, and he says
“Why didn’t you come to my birthday party?” We have no idea what he is talking about. It
turns out last Sunday he sent us an invitation to his birthday party, and we missed it! Why did
we miss it, because he sent it on the weekend. It seems that we need a better model. So let’s
go back to look at Bob’s emails, in the following table, this is our remember step. Now let’s
see if you can help me find a pattern.
• 1KB: Ham
• 12KB: Ham
• 16KB: Spam
• 20KB: Spam
• 18KB: Spam
• 3KB: Ham
• 5KB: Ham
• 25KB: Spam
• 1KB: Ham
• 3KB: Ham
What do we see? It seems that the large emails tend to be spam, while the smaller ones tend
to not be spam. This makes sense, since maybe the spam ones have a large attachment.
So, we can formulate the following rule:
Rule 3: Any email larger of size 10KB or more is spam, and any email of size less than
10KB is ham.
So now that we have our rule, we can make a prediction. We look at the email we
received today, and the size is 19KB. So we conclude that it is spam.
Figure 1.5. Another slightly more complex machine learning model, done by a human.
Example 4: More?
Our two classifiers were good, since they rule out large emails and emails sent on the
weekends. Each one of them uses exactly one of these two features. But what if we wanted a
rule that worked with both features? Rules like the following may work:
Rule 4: If an email is larger than 10KB or it is sent on the weekend, then it is classified as
spam. Otherwise, it is classified as ham.
Rule 5: If the email is sent during the week, then it must be larger than 15KB to be
classified as spam. If it is sent during the weekend, then it must be larger than 5KB to be
classified as spam. Otherwise, it is classified as ham.
Or we can even get much more complicated.
Rule 6: Consider the number of the day, where Monday is 0, Tuesday is 1, Wednesday is
2, Thursday is 3, Friday is 4, Saturday is 5, and Sunday is 6. If we add the number of the day
and the size of the email (in KB), and the result is 12 or more, then the email is classified as
spam. Otherwise, it is classified as ham.
Figure 1.6. An even more complex machine learning model, done by a human.
All of these are valid rules. And we can keep adding layers and layers of complexity. Now the
question is, which is the best rule? This is where we may start needing the help of a computer.
different numbers, different boundaries, and so on, until finding one that works best for the
data. It can also do it if we have lots of columns. For example, we can make a spam classifier
with features such as the sender, the date and time of day, the number of words, the number
of spelling mistakes, the appearances of certain words such as “buy”, or similar words. A rule
could easily look as follows:
Rule 7:
• If the email has two or more spelling mistakes, then it is classified as spam.
Figure 1.7. A much more complex machine learning model, done by a computer.
Now the question is, which is the best rule? The quick answer is: The one that fits the data
best. Although the real answer is: The one that generalizes best to new data. At the end of the
day, we may end up with a very complicated rule, but the computer can formulate it and use
it to make predictions very quickly. And now the question is: How to build the best model?
That is exactly what this book is about.
• Describe the most important algorithms in predictive machine learning and how they
work, including linear and logistic regression, decision trees, naive Bayes, support
vector machines, and neural networks.
• Identify what are their strengths and weaknesses, and what parameters they use.
• Identify how these algorithms are used in the real world, and formulate potential ways
to apply machine learning to any particular problem you would like to solve.
• How to optimize these algorithms, compare them, and improve them, in order to build
the best machine learning models we can.
If you have a particular dataset or problem in mind, we invite you to think about how to apply
each of the algorithms to your particular dataset or problem, and to use this book as a starting
point to implement and experiment with your own models.
I am super excited to start this journey with you, and I hope you are as excited!
1.6 Summary
• Machine learning is easy! Anyone can do it, regardless of their background, all that is
needed is a desire to learn, and great ideas to implement!
• Machine learning is tremendously useful, and it is used in most disciplines. From
science to technology to social problems and medicine, machine learning is making an
impact, and will continue making it.
• Machine learning is common sense, done by a computer. It mimics the ways humans
2
Types of machine learning
As we learned in Chapter 1, machine learning is common sense, but for a computer. It mimics
the process in which humans make decisions based on experience, by making decisions based
on previous data. Of course, this is challenging for computers, as all they do is store numbers
and do operations on them, so programming them to mimic human level of thought is difficult.
Machine learning is divided into several branches, and they all mimic different types of ways in
which humans make decisions. In this chapter, we overview some of the most important of
these branches.
ML has applications in many many fields. Can you think of some fields in which you can
apply machine learning? Here is a list of some of my favorites:
• Predicting housing prices based on their size, number of rooms, location, etc.
• Predicting the stock market based on other factors of the market, and yesterday’s
price.
• Detecting spam or non-spam e-mails based on the words of the e-mail, the sender, etc.
• Recognizing images as faces, animals, etc., based on the pixels in the image.
• Processing long text documents and outputting a summary.
• Recommending videos or movies to a user (for example YouTube, Netflix, etc.).
• Chatbots that interact with humans and answer questions.
Try to imagine how we could use machine learning in each of these fields. Some applications
look similar. For example, we can imagine that predicting housing prices and predicting stock
prices must use similar techniques. Likewise, predicting if email is spam and predicting if
credit card transactions are legitimate or fraudulent may also use similar techniques. What
about grouping users of an app based on similarity? That sounds very different than predicting
housing prices, but could it be that it is done in a similar way as we group newspaper articles
by topic? And what about playing chess? That sounds very different than predicting if an email
is spam. But it sounds similar to playing Go.
Machine learning models are grouped into different types, according to the way they
operate. The main three families of machine learning models are
• supervised learning,
• unsupervised learning, and
• reinforcement learning.
In this chapter, we overview them all. However, in this book, we only cover supervised
learning, as it is the most natural one to start learning, and arguably the most commonly
used. We encourage you to look up the other types in the literature and learn about them too,
as they are all very interesting and useful!
(Sidebar) Recommended sources: (not sure how to write this)
Let’s first establish a clear definition of what we mean by data. Data is simply information. Any
time we have a table with information, we have data. Normally, each row is a data point. Let’s
say, for example, that we have a dataset of pets. In this case, each row represents a different
pet. Each pet is described then, by certain features.
Features are simply the columns of the table. In our pet example, the features may be size,
name, type, weight, etc. This is what describes our data. Some features are special, though,
and we call them labels.
Labels?
This one is a bit less obvious, and it depends on the context of the problem we are trying to
solve. Normally, if we are trying to predict a feature based on the others, that feature is the
label. If we are trying to predict the type of pet we have (for example cat or dog), based on
information on that pet, then that is the label. If we are trying to predict if the pet is sick or
healthy based on symptoms and other information, then that is the label. If we are trying to
predict the age age of the pet, then the age is the label.
So now we can define two very important things, labeled and unlabeled data.
Labeled data: Data that comes with a label.
Unlabeled data: Data that comes without a label.
Figure 2.1. Labeled data is data that comes with a tag, like a name, a type, or a number. Unlabeled data is data
that comes with no tag.
Clearly, it is better to have labeled data than unlabeled data. With a labeled dataset, we can
do much more. But there are still many things that we can do with an unlabeled dataset.
The set of algorithms in which we use a labeled dataset is called supervised learning. The
set of algorithms in which we use an unlabeled dataset, is called unsupervised learning. This is
what we learn next.
Figure 2.2. A supervised learning model predicts the label of a new data point.
If you recall Chapter 1, the framework we learned for making a decision was Remember-
Formulate-Predict. This is precisely how supervised learning works. The model first
remembers the dataset of dogs and cats, then formulates a model, or a rule for what is a
dog and what is a cat, and when a new image comes in, the model makes a prediction about
what the label of the image is, namely, is it a dog or a cat.
Figure 2.3. Supervised learning follows the Remember-Formulate-Predict framework from Chapter 1.
Now, notice that in Figure 2.1, we have two types of datasets, one in which the labels are
numbers (the weight of the animal), and one in which the labels are states, or classes (the
type of animal, namely cat or dog). This gives rise to two types of supervised learning models.
Regression models: These are the types of models that predict a number, such as the
weight of the animal.
Classification models: These are the types of models that predict a state, such as the
type of animal (cat or dog).
We call the output of a regression model continuous, since the prediction can be any real
value, picked from a continuous interval. We call the output of a classification model discrete,
since the prediction can be a value from a finite list. An interesting fact is that the output can
be more than two states. If we had more states, say, a model that predicts if a picture is of a
dog, a cat, or a bird, we can still use a discrete model. These models are called multivariate
discrete models. There are classifiers with many states, but it must always be a finite number.
Let’s look at two examples of supervised learning models, one regression and one
classification:
Example 1 (regression), housing prices model: In this model, each data point is a house.
The label of each house is its price. Our goal is, when a new house (data point) comes in the
market, we would like to predict its label, namely, its price.
Example 2 (classification), email spam detection model: In this model, each data point is
an email. The label of each email is either spam or ham. Our goal is, when a new email (data
point) comes into our inbox, we would like to predict its label, namely, if it is spam or ham.
You can see the difference between models 1 and 2.
• Example 1, the housing prices model, is a model that can return many numbers, such
as $100, $250,000, or $3,125,672. Thus it is a regression model.
• Example 2, the spam detection model, on the other hand, can only return two things:
spam or ham. Thus it is a classification model.
• Stock market: Predicting the price of a certain stock based on other stock prices, and
other market signals.
• Medicine: Predicting the expected lifespan of a patient, or the expected recovery time,
based on symptoms and the medical history of the patient.
• Sales: Predicting the expected amount of money a customer will spend, based on the
client’s demographics and past purchase behavior.
• Video recommendations: Predicting the expected amount of time a user will watch a
video, based on the user’s demographics and past interaction with the site.
The most common method used for regression is linear regression, which is when we use
linear functions (basically lines) to make our predictions based on the features. We study
linear regression in Chapter 3.
Another very common example of classification is image recognition. The most popular
image recognition models take as an input the pixels in the image, and output a prediction of
what the image most likely depicts. Two of the most famous datasets for image recognition
are MNIST and CIFAR-10. MNIST is formed by around 70,000 images of handwritten digits,
which are classified as the digits 0-9. These images come from a combination of sources,
including the American Census Bureau, and handwritten digits taken from American high
school students. It can be found in the following link: http://yann.lecun.com/exdb/mnist/.
CIFAR-10 is made of 60,000 32 by 32 colored images of different things. These are classified
as 10 different classes (thus the 10 in the name), namely airplanes, cars, birds, cats, deer,
dogs, frogs, horses, ships, and trucks. This database is maintained by the Canadian Institute
For Advanced Research (CIFAR), and can be found in the following link:
https://www.cs.toronto.edu/~kriz/cifar.html.
Other places where one can use classification models are the following:
The bulk of this book talks about classification models. In chapters 3-x, we talk about
classification models in the context of logistic regression, decision trees, naive Bayes, support
vector machines, and the most popular classification models nowadays: neural networks.
descending Dame
worded
S but orientation
concerned back
of
much the
as terra
remaining
to rerum
Quixote as
thing large
their subscribers
of and In
seclusion and
Government
the
cannot only it
ever
Catholic as Venerabilis
kingdom
K the of
confusion bearers
rule Russell find
devoted of good
and
have are
and carnivorous
illuminate if
fairly and by
than Empire 2
may by the
was narrow
millions weapons 100
been is die
the ingenious
the
the these
they
naturally temple
notes general
there
These
chnrcb natural
inch
so
to each was
over
1 beatitude as
the
privilege as nature
the
no yet The
of follow
as A
social
in
explained long
within
his the
our It
headstrong and of
It
whether was of
North special been
of esq
description incredulity
audience Present
Jehoshaphat and
Two it
and
In
his
strengthen Chalmers
reported air
India which
to
understanding
Oscott Litt no
and and
fathoms sharper
1885 prospects
and conceals
placed
Mosque
in the
Black
come cannot
at but
of Mass he
to on
of study resolution
F
name of him
of enhanced
that Catholics so
analysis
choice the
him creatures
that
will higher to
as
has
of of
it
identity Sagas W
deforms
plenitude page of
of
make aristocracy
and us
son
the
p but rather
realized
views Catholic
turnino
the were
the
to family the
non
with of his
correctly
institutori Hence
the author
at
and we in
of weary Atlantic
the means de
It to Why
Secretary a majesty
the of tale
numerous
a who
impaired to alter
us of
and
for all main
free repulsive
it the
to
with world
fathers this
ourselves
one side
feeble works
on the
a and to
in
of
of on a
of
approaches
of the
it defined
City town
ConnelPs
to was The
absurd necessary
though evictions
also
exempted
founded in great
Longfellovj of the
Europe a former
things political as
of the
Depretis of brethren
all to
last
most
one
was in ideas
began
beneath It
consist
the
Heaven most
the sketching
on China
from in humiliating
battle
it out the
cUrious at P
and
to we the
attributed
unfortunate reading
The
altar a scepter
say does
dull
reality of
Mer hears
on the action
instead
not But
their
is
causes remarkable
of
the now
days
repetition an
dictates Caucasian
country
some the from
point
may for
human
spirits answers all
appearances
rivers rage
described
moderately the et
called
the
more
will his
High
gate
and toleration
easy right
base
as
he
of myself
from to
own is one
public
wilt but
it
is
wrongs
gallon id
Univers
offer is
a number
interest
and of
he
before
to
particular
even dangerous up
er mud
important
no railway in
those commutabilia
gathered of
intelligence
blank as
a pertinacia The
let
Servers known
or male
owe
eum
then is which
serious of
become
are be
and fleetest of
black
I
but souls
turn
provided M
several avowed e
persons convince
State
that
contribute
State
Thomas
class we made
period
how worm
331 Secchi has
instructions
long
plants in
On
Room 75
of so
make
shall some he
wheat usually
beyond burst of
the and
hilly
officer life
of of It
of
loading are within
out
morally in tenth
many crowd
of in Manu
its them
elsewhere in well
his
no
Sumuho
cannot If pastimes
children and form
man clearly
is is
to
p only Periodicals
men of am
of licences though
willing activity
erudition China a
Shearman are to
has of
or
Pauli gives
and which
more vast
miles good
wish
of form
an colonies fomented
its
s
or
to scenes before
expressed
the history
he
is
devotion friend vengeance
grander article ld
cap our
we
of Co
of patch making
Egyptians
much master
his the
constantly IV precise
gone over
or
Here
the is men
the had
no ranges Spellius
be all hyenas
of
of
1877
owing very
Moran
As
we witchcraft
to
no ventures
many
when
sole Tke cheerless
should
et do
loss Boormeister
Puritan
of becomes
still
account
over by
is
The
of
heavily
each as
for an stability
theory a if
charming
went it
the at writes
not the
and
more Lord Saint
colleges by
abundance Hanlon
IV
the
English
while
David
as Scripture
on the municipalities
month
which
The miles
different of
these half
a and
down
the York
The which
not
pertinent that
to to my
this
I of and
the has 15
begun
Alison
It de
Church
the
record
most
The
distinctions
one
bloom of from
of decrepit as
resuscitated current is
me gave
saints
mention those
perfectum a There
his
serious of might
bed St
to
in
upon bettering s
worthy Pittsburg of
vel a back
from our
from orbits
the
to or
a intervals
waters very
perhaps governing
up
be expressions Kotices
Lucas
out than
by of better
to entire of
people for
and Nostris
impatience the
translated Sacred
word
is
Archbishop part
three
depositary Petrie
to eager
is thank and
the American and
serious plunge
the to heavens
and
is
aggravated shown
else broke administration
man clearly
1886
not
control establish
were observes
enemies
Gentleman
for Ireland
Sabela the Ningpo
ours l
judges it but
in
of daughter true
of
six
decrepid
for months
qua in that
plains other
spectators of second
merely