0% found this document useful (0 votes)
57 views32 pages

Deep Learning Basics

deep le

Uploaded by

N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views32 pages

Deep Learning Basics

deep le

Uploaded by

N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

An Introduction to

Deep Learning
Understanding the Basics of
How (and Why) it Works

A WHITE PAPER BY DATAIKU

A G U I D E www.dataiku.com
BOOK BY DATAIKU
www.dataiku.com
©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 1
AN INTRODUCTION
TO DEEP LEARNING
Understanding The Basics of How (and Why) it Works

Background & About the Guidebook


Deep learning has been around for a while, so why has it just become a buzz topic in the last five years? Well, it
returned to the headlines in 2016 when Google’s AlphaGo program crushed Lee Sedol, one of the highest-ranking
Go players in the world.

While previously there wasn’t a good way to train deep learning neural networks, now with advancements in machine
learning (ML) algorithms and deep learning chipsets, deep learning (DL) is being more actively implemented. It is
being applied across industries, from healthcare to finance to retail and everything in between, and the global
deep learning market is expected to reach $10.2 billion by 2025 1 .

U.S. deep learning market, by solution, 2014 - 2025 (USD Million)

Picture Source 2

But what is it?


Put very simply, deep learning is a subset of machine learning that involves multiple layers of representations that
allow a computer to learn and deduce outputs from data. While this sounds basic enough, what’s going on behind
the scenes is anything but.

In fact, some of the controversy surrounding deep learning (and more particularly surrounding artificial intelligence,
or AI) is the fear of the “black box.” That is, how can anyone base a service or product on deep learning and trust
the decisions being made if no one knows how they’re being made?

This guidebook will unpack some of the nuances and intricacies to help uncover what makes DL such an effective
solution to some of today’s most complex problems. But on top of that, the goal is to take a deeper dive into how
certain aspects of DL work to build more trust and confidence around the technology with business leaders as well
as data teams. If you know how it works, it becomes less intimidating (and its use cases become more clear).

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 2


This guide will walk through:

• Deep learning at a high-level.

• High-level use cases by industry.

• A more advanced definition of deep learning, including


neurons and neural networks.

• A look at what gradient descent is and types of neural


networks.

• An in-depth look at the most popular type, Convolutional


Neural Networks - CNN (used for image detection).

• How deep learning can be sped up with less data.

• Specific, real-life examples of deep learning in action.

• The future of deep learning & further reading.

If you haven’t had any previous exposure to ML, we recommend reading the illustrated Machine
Learning Basics guidebook 3 . In addition, less technically inclined readers might consider going from
the first few sections (high-level definitions and use cases) to the last few sections (real-life applications),
since the sections between dive into the mechanics behind neural networks and are for people more
interested in how deep learning is implemented.

3 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


Deep Learning: High-Level

You might already know that deep learning works because it imitates how the human brain works and how
people learn. But to take that one step further, think about a child learning to associate words with objects.
Her first word might be “cat.” She then might point to any animal at all and say “cat.” If what she’s pointing to
is, in fact, a cat, her father might confirm this by saying “yes, this is a cat.”

However, if she points to a dog, her father will then say something like “no, that’s not a cat - it’s a dog.” And
gradually, subconsciously, the child learns about what exactly makes a cat a cat, what makes it different than
a dog, and adds more and more complex layers (like for example, what makes a house cat different than a
lion, which is also - technically - a cat).

Deep learning works similarly in that a computer takes inputs (data - often unstructured, like text, videos,
images, or even sound) and extracts useful information. It does this through a hierarchy of increasing
complexity and abstraction, continuously using knowledge and learning from previous layers, until it reaches
an accurate output. For those ready for more, not to worry - this guidebook will go into even more depth on
this definition later (see “Going Deep on Deep Learning” if you’re impatient).

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 4


DL vs. ML vs. AI: What’s What?

Talking about deep learning is increasingly complex because it’s often used along side (or even
interchangeably with) the terms machine learning and artificial intelligence (AI).

First here is what you need to know: DL is a subset of ML ,which is itself a subset of AI (this graph helps explain
this nuance) 4 .

Before the start of Machine Learning in the 80s, business decision rules were mostly hand-coded set of
instructions based on the knowledge of business experts. With machine learning, those rules are inferred
from the previously collected data - the business expertise plays a role (and is in fact required) for the feature
engineering part.

Basically, the business expert needs to determine which factors may impact the result you want to predict,
and the algorithm automatically selects the optimal way to combine these factors. You ”train” a model. The
key question is: based on my data, what is the best rule I can create to solve my business problem?

5 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


Now what are the specificities of DL?

A DL algorithm is able to learn hidden patterns from the data by itself, combine them together, and build much
more efficient decision rules. That’s why it can deal with problems that a human brain could not understand
- all the value of deep learning is this automatic pattern identification capability. This means handling more
complex problems, such as understanding concepts in images, videos, texts, sounds, time series, and all other
unstructured data you think of.

But don’t think deep learning as a model learning by itself. You still need properly labeled data, an evaluation of
the model results, and of course an evaluation of the business value it will bring! Actually, the lack of precisely
labeled data is one of the main reasons DL can have disappointing results in some business cases.

Of course handling more complex data means more complex algorithms. And to extract general enough complex
patterns from complex data, you will need lots - read LOTS - of examples (much more than an ML model) -
typically millions of labeled images for a classification task.

Since the feature engineering is automatically done by the machine, the interpretation is not obvious for a
human and DL “black-box” decision rules can be rejected by business analysts. In fact, DL model interpretability
is one of today’s biggest DL research challenges.

That means don’t throw all your classical ML models


out the window! In most classical data-related
problems, ML approaches still do a better job than
DL since they don’t involve unstructured data.
For example, here’s a simple illustration: I want
to go from Manhattan to Brooklyn/Boston/Paris :
walk, car, train, plane? It comes down to a simple
compromise between cost and time spent (cost
> model designing time, time spend > accuracy).
Flying from Manhattan to Brooklyn is the equivalent
of using a DL model for a “simple” problem (that is
to say, it’s not efficient at all).

What about AI then? Well… we don’t really know. Is


it a fancier word for DL? Some claim they did AI in
the 50s. But most of them have something to do with
automation and developing computer systems that
think or learn more like humans than like traditional
machines. And if this is the case, than at the very
least, deep learning is certainly a large step toward
that goal.

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 6


And don’t forget about reinforcement learning (RL) as well, which plays a role here too. Reinforcement learning
consists of implementing strategies for a machine to be able to learn by itself. Very few real-world applications
have been implemented for now in production (well, AlphaGo is one, but Google DeepMind doesn’t expect a
huge revenue from Go players in the coming years…). However, it does have some theoretical applications in
robotics/manufacturing, health care, advertising, and finance.

One of the main issues in real-world RL is that

1. 2.
It is based on the failing by learning
It requires lots of data (again) and
strategy, so you need a not-too
critical use case (A/B testing for
instance) or a realistic simulation tool
to train your model.

So don’t worry,
we are far from
Terminator -
when broken
down, AI is really
an extension of
technologies that
we’re already using
today.
7 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku
Deep Learning Applications
Advancements in deep learning algorithms as well as hardware have resulted in an explosion of applications
both in the consumer sector and within the enterprise that were not possible just five years ago.

Many of today’s use cases leverage computer vision and image detection. And because (as previously
emphasized) deep learning works best as the amount of data scales - that is, it needs massive amounts - its
most practical applications today are in the following industries:

• Manufacturing: So-called Industry 4.0 5 is bringing sweeping changes to manufacturing by


introducing automation, including machine learning (and increasingly deep learning) everywhere.
Japanese company Fanuc is using deep reinforcement learning 6 to help some of its industrial robots
train themselves. Other applications specifically for deep learning include automating quality testing and
advanced predictive maintenance (specifically using imagery to detect microcracks in machinery).

• Automotive: On the enterprise side, many of the gains in deep learning in the automotive sector are
in manufacturing (see above). But of course, deep learning technology - more specifically image recognition
and computer vision - is also the cornerstone of self-driving cars. It is responsible for detecting lanes,
traffic lights, even people (and it often does this better - that is faster - than a human could, especially in
situations like at night or if something comes in front of the vehicle quickly).

• Hospitality: In an industry where exceptional customer service can make a customer for life, deep
learning in the hospitality industry is centered around creating better and better customer service bots.
Creating a bot that truly responds like a human, particularly reacting to emotional states 7 , takes deep
learning technology.
• Health Care: From drug discovery to image detection for early (or more accurate) disease detection
to insurance fraud prevention, health care is perhaps one of the industries poised to be changed the most
by advances in deep learning.

• Banking, Insurance, & Finance: As fraudsters get more advanced, techniques for
fraud detection need to advance along with them. Deep learning is ideal for this industry because it’s
often difficult to identify good features as fraud becomes increasingly difficult to detect.

• Agriculture: Computer vision and deep learning hold great promise to revolutionize farm
machinery. Robots that can “see,” for example, weeds can eliminate them with a targeted approach.

• Entertainment: From advanced recommendation engines to fake news detection, deep learning is
already present across the sector. Upcoming trends include the so-called “Immersive Experience Industry,”
which will largely be based on DL technology.

• IT/Security: Malware detection is an increasingly important cyber security problem, and similar to
the challenges faced in the banking, insurance, and finance industry, detection methods must grow more
sophisticated along with their attackers. Deep learning is well suited because the models are robust enough
to handle natural variations in malware.

• Retail, Supply Chain & Logistics: Deep learning is changing the way retailers buy,
stock and sell products. Just one of the many examples of its applications is the use of computer vision in
warehouses or on retail shelves to determine low stock.

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 10


Going Deep on Deep Learning
On a less conceptual and more tactical level, deep learning (DL) is a subset of machine learning (ML) which
focuses on learning data representations. The focus on relationships - rather than tasks like classical ML
algorithms - creates transferable solutions. Again, it’s the difference between being able to identify a cat as a
whole as opposed to understanding the different concepts defining a cat (like the paws, tail, and ears), and they
way they are nested.

This is one of the key reasons deep learning is more powerful than classical machine learning - it creates
transferable solutions. That is, the concepts of paw, tail, and ears can be easily reused to understand what a
dog is as well.

Deep learning algorithms are able to create transferable solutions through neural networks: that is, layers of
neurons/units.

For some, understanding that neurons make up neural networks, and those in turn allow machines (via deep
learning) to “learn” like humans, might be enough (if so - you might consider skipping ahead to learn about the
types of neural networks).

But to have a more robust understanding, it’s also important to understand how those underlying neurons
actually work.

11 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


How Do Neurons Work?
A neuron takes input and outputs a number that assigns the input to a class (group).

The output is determined the way you would make a decision: imagine you’re deciding where to eat and
consider taste, location, and price. Each input has a different level of importance.

Well, a neuron similarly takes multiple inputs, each with a corresponding weight (importance). The inputs are
passed through an activation function which gives the final output (class of the input). For example, if there’s
a high probability that you’ll eat at Shake Shack based on your taste, the location of the nearest Shake Shack,
and the price point, then the activation function will output Shake Shack as the final output.

Deep learning problems boil down to classification - whether binary (e.g., is this image a cat, or not a cat?)
or multiclass (e.g., is this image a cat, a dog, a bird, etc.). So finding the optimal features (variables) and
parameters (weights) are key. DL is used for complex problems like medical diagnosis, but the underlying goal
of finding boundaries (for positive and negative) can be thought of conceptually, like classifying purple and
green points in a plane:

In this case, the “drawing” of a diagnosis boundary (our classification model) depends on gene 1 and gene 2
(our features). Points farther from the boundary are more likely to be in their respective class.

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 12


How To (Neural) Network
The model above can be built with a single layer of neurons, but in reality, the model would be much more
complex (realistically, the diagnosis would depend on more than two inputs, but let’s pretend for simplicity’s
sake, that it only depends on gene 1 and gene 2).

Adding layers lets the computer create more and more specific features that lead to a more complex final
output. For our example, adding more layers would let us create a more complex final boundary (straight line
-> simple curve -> complex curve).

If this were an image classification problem, more layers would allow us to identify more complex images
(blobs, edges -> noses, eyes, cheeks -> face).

Understanding gradient descent is helpful for understanding deep learning because it’s one of the most popular
- if not the most popular - strategy for optimizing a model during training (that is, making sure it’s “learning”
correctly).

Remember that in deep learning, it’s the algorithm that finds the features for the most accurate classification
(instead of the human, as is the case in machine learning), so the computer needs a way to determine the
optimal features and weights (ones that lead to the most accurate final classification).
The Nitty-Gritty Details

This happens through choosing the features and weights that minimize some error/cost function. The error/cost
function is the sum of loss functions (predicted value of a point - actual value of a point) + a regularization term.
The regularization term penalizes models with many features to prevent overfitting (being accurate for a specific
dataset but failing to generalize).

To minimize our error function, we use gradient descent: the computer chooses certain parameters (features
and weights) and takes the negative gradient (gradient is the rate of greatest increase, so the negative gradient
is the rate of greatest decrease) of the error function until it finds the parameters that lead a gradient of 0
(corresponding to a minimum of the error function). It works like getting to the lowest point on a mountain as
quickly as possible: you walk in the direction of steepest decrease until you hit a minimum. For example, here
we keep adjusting the line until we have minimized the classification error (larger dots correspond to larger
errors).

Visual Representation

Gradient descent is a little tricky to describe, but it’s easier to understand visually how it works to minimize
errors:

Gradient descent is an optimization algorithm used to find the best solutions to problems. Here we used it to
find the best features and weights, but gradient descent is also used for other optimization problems like finding
the best filters (we’ll talk about this later).

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 14


Types of Neural Networks
and How To Apply Them
There are countless types of neural networks8 .
Here is an overview of some of the most relevant
types:

Feed Forward - Used in computer vision and speech recognition when classifying the target classes are
complicated. Responsive to noisy data and easy to maintain.

Radial Basis - Considers the distance of a point with respect to the center. Used for power restoration
systems which are notoriously complicated.

Kohonen - Recognizes patterns in data. Used medical analysis to cluster data into different categories ( a
Kohonen network was able to classify patients with a diseased glomerular vs. a healthy one)

Recurrent - Feeds the output of a layer back as input. Good for predicting the next word in a body of text but
harder to maintain.

Modular - Collection of different networks work independently and contribute towards the final output.
Increases computation speed (through the breakdown of a complicated computational process into simpler
computations), but processing time is subject to the number of neurons.
In this guidebook
we’ll focus on
convolutional neural
networks (CNN),
which are similar
to feed-forward
neural networks but
dominate computer
vision because of
their much higher
accuracy.

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 16


In-Depth: Convolutional Neural Networks - CNN

For complicated problems like image identification, it’s difficult and time-consuming to try to identify the most
important variables before training (feature engineering). This is why deep learning instead applies feature
learning, where the machine learns the optimal features and weights on its own. Again, each layer corresponds
to more and more specific features (blobs, edges -> noses, eyes, cheeks -> face).

17 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


It’s important to remember that a computer only works with numbers, so the image and each filter are
converted to matrices.

A convolutional neural networks (CNN)


is made of 2 main layers:

• The convolutional and pooling layer(s) extract the optimal features. Each feature is a filter is slide over
the target image to break the image into simpler images.
• The fully connected layer identifies the class of the image by comparing it to different images and
finding the best match.

This is an overview of how the CNN identifies an


image of an X.

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 18


Let’s dive into the details of each step of the
process:

19 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


The convolutional and pooling layer decompose our
image into smaller images (allows for more precise
matching with the filters).

But remember, the computer only understands


numbers, so our images would actually be
represented by numbers:

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 20


Let’s talk about how the computer reduced this starting matrix of 1s and -1s to the smaller matrix of 4s and
-4s.

The mapping of 1s and -1s to 4s and -4s relies on the type of filter the computer chooses (the best filter, like
the best features for classification, is found through gradient descent!). In this problem, our computer chose
a filter which adds (+) the number in the top left , subtracts(-) the top right, subtracts(-) the bottom left, and
adds(+) the bottom right. So in our case, the filter starts in the top left quadrant (which corresponds to a \)
and performs the operation + (1) - (-1) - (-1) + (1) to get the final value of 4. Then it moves to the top right
quadrant (which corresponds to a /) and performs the operation + (-1) - (1) - (1) + (-1) to get the final value of
-4. The filter reduces each 2x2 area to a 4 or -4. That’s why this filter works well - it outputs different values
for different filter images.

21 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


Imagine we instead had a filter of all +s. We would
get an output of 0 for both \ and /, so this would
not be a good filter for our problem.

After our image is decomposed, the fully connected


layer determines the class of the image through
sequential steps:

Each filter slides over each section (top left -> top right -> bottom left -> bottom right), outputting a 1 (firing)
for a match or -1 for absence of the filter. The top row are the number outputs from sliding the \ filter over
each section, and the 2nd row is for the / filter.

First Layer Representation Second Layer Representation Third Layer Representation

This matrix is then compared to different filter


matrices found through - you guessed it - gradient
descent. In this case, our computer chose “X”, “O”,
“\”, and “/” filters for comparison.
©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 22
The computer applies each of these filters (green in the image above) to our “X” matrix (blue in the image
above). For example, when it applies the green “X” filter (which is a perfect match), it adds (+) or subtracts (-) the
number in the parallel slot in the blue matrix; it performs the operation + (1) - (-1) - (-1) + (1) - (-1) + (1) + (1) - (-1)
to get the final score of 8. It applies each of the green filters to our blue “X” matrix, outputting a score. The filter
with the highest score (indicating the closest match) is selected!

Just to get it to all sink in, here’s an overview of


the entire process for identifying an “X”:

To reiterate, the convolution and pooling layer(s) slides 2x2 filters over each area of the image, reducing each
2x2 area to a number. The fully connected layer compares the image to different image filters. The image with
the highest score is the best match!

PANTHER

23 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


The same process happens for more complex image
recognition like facial recognition.
Picture source 9

Here, the only difference is that the filters are more complicated and that the network has many more layers to
handle the increased complexity. Also, if the image is in color, an image is initially represented as three stacked
matrices (1 for red, 1 for blue, 1 for green) instead of a single matrix.

25 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


Speeding Up Deep Learning
A popular way to build faster deep learning models with less data is through transfer learning. Transfer learning
takes an existing DL model and modifies it (similar to how we use our knowledge of cars to help us understand
a truck).

Fine-tuning is a method of doing transfer learning. We take a pre-trained model, change the weights of the top
layer(s), and “freeze” other layers so that their weights don’t change during learning. The number of layers we
change depends on how more similar our images are. If we had a pre-trained model of a cat and were trying
to identify a specific cat, we wouldn’t change very many layers. If we were instead trying to identify a lion, we
would need to change more layers, because cats and lions don’t have as much overlap..

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 26


Feature: Real-Life Applications
& The Futureof Deep learning

Brought to You By GoDataDriven


GoDataDriven is a team of passionate data science and software engineering practitioners. By combining these disciplines
with large-scale, open-source information platforms, they create cutting-edge data science solutions with real business
impact.

The future of deep learning is bright because of its open source community and accessible platforms. Increasingly, leading
corporations such as Apple, Facebook, and Google, are making their technology accessible to the public.

“The main reason organizations make the switch to open-source is that it becomes easier to find deep learning talent. A
company could have developed the most amazing and efficient deep learning system, but if they don’t publish their research
and share their knowledge, talented data scientists and deep learning practitioners won’t be able to learn about their
system and apply it to their organization.”
Rodrigo Agundez, Lead Data Scientist @ GoDataDriven

Because of the shift towards open-source models, deep learning teams like Google Brain, Google DeepMind, and companies
like Facebook and Baidu are finding it easier to hire talented deep learning practitioners and become more cutting-edge.

In the near future, deep learning will significantly improve voice command systems (think Siri and Alexa), as well as health
care and image identification:

27 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


Detecting Anomalies in 4d Heart Scans
Google, Siemens, and GoDataDriven wanted to help doctors more accurately and quickly determine treatments.
Thus, they developed an app that uses MRI data in a 4D format (volume + time) to detect anomalies in the heart.
Using deep learning models, the team calculated changes in heart ventricles volume over time, which they used
as consideration for prognosis and heart failure.

The model, based on the U-Net deep learning architecture, takes the MRI scan as input and outputs the
corresponding volumes. Traditionally, this process is done manually by a doctor using hand-drawn diagrams, but
this model greatly accelerates the process and accuracy.

Sorting Images for the Largest


Global Flower Auction
Royal FloraHolland is the world’s biggest horticulture marketplace and knowledge center. Their daily global
auctions use a digital platform, so it is essential to correctly display photographs of the plants. Suppliers upload
these images themselves, so it’s important to have some kind of quality check before approving the files.

Sorting through all these photographs manually very time-consuming, so GoDataDriven designed a deep
learning system to automate photo quality checking. The system removed the need for tedious manual review as
it can accurately identify and sort pictures, even ones from different angles and devices.
Endnotes
1 Deep Learning Market Size Worth $10.2 Billion By 2025

2 Deep Learning Market Size, Share & Trends Analysis Report By Solution, By Hardware (CPU, GPU, FPGA, ASIC), By Service,

By Application, By End-Use, By Region, And Segment Forecasts, 2018 - 2025

3 White Paper: Machine Learning Basics

4 Deep Learning - Nvidia

5 What Everyone Must Know About Industry 4.0

6 Machine Learning in Manufacturing – Present and Future Use-Cases

7 How Chatbots Are Learning Emotions Using Deep Learning

8 The mostly complete chart of Neural Networks, explained

9 Deep Learning in a Nutshell: Core Concepts

29 ©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku


Your Path to
Enterprise AI 300+
CUSTOMERS
Dataiku is one of the world’s leading AI and machine
learning platforms, supporting agility in organizations’
data efforts via collaborative, elastic, and responsible AI, 30,000+
ACTIVE USERS
all at enterprise scale. Hundreds of companies use
Dataiku to underpin their essential business operations
and ensure they stay relevant in a changing world. *data scientists, analysts, engineers, & more

1. Clean & Wrangle 5. Monitor & Adjust

Network_dataset Test Test_Scored

Netezza
Teradata Train MLlib_Prediction

Oracle Vertica
HDFS_Avro Joined_Data

Amazon_S3 HDFS_Parquet

Cassandra 4. Deploy
2. Build + Apply to production
Machine Learning

3. Mining
& Visualization

©2020 Dataiku, Inc. | www.dataiku.com | [email protected] | @dataiku 30


GUIDEBOOK
www.dataiku.com

You might also like