0% found this document useful (0 votes)
70 views65 pages

AI/ML Basics for Tech Enthusiasts

Uploaded by

webserviceszion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views65 pages

AI/ML Basics for Tech Enthusiasts

Uploaded by

webserviceszion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

A to Z of AI/ML:

A Quick Introduction to Artificial Intelligence and


Machine Learning Capabilities and Tools
EngCon 2017

Mark Crowley
Assistant Professor
Electrical and Computer Engineering
University of Waterloo
[email protected]

Sep 23, 2017

Mark Crowley A to Z of AI/ML Sep 23, 2017 1 / 112


Introduction

Outline

Introduction

What is AI?

Neural Networks

Convolutional Neural Networks

Do you need AI/ML?

Mark Crowley A to Z of AI/ML Sep 23, 2017 2 / 112


Introduction

My Background

Waterloo : Assistant Professor, ECE Department since 2015


PhD at UBC in Computer Science with Prof. David Poole
Postdoc at Oregon State University
UW ECE ML Lab:
https://uwaterloo.ca/scholar/mcrowley/lab
Waterloo Institute for Complexity and Innovation (WICI)
Research Fellow at ElementAI
Pattern Analysis and Machine Intelligence (PAMI)
http:\waterloo.ai
List of faculty
Research projects (co-op/internships)
List of spinoff companies from UWaterloo (good place for project ideas)

Mark Crowley A to Z of AI/ML Sep 23, 2017 3 / 112


What is AI?

What do you think of when you hear?

Artificial Intelligence Machine Learning

Mark Crowley A to Z of AI/ML Sep 23, 2017 4 / 112


What is AI? Landscape of Big Data/AI/ML

Data, Big Data, Machine Learning, AI, etc, etc,

Big Data Machine Artificial


Tools Learning Intelligence

Data Analysis Classification, Automated


Data Patterns, Decision Making
Reports, statistics,
X f1 f2 … fk Predictions
Charts, trends
.5 104.2 China Probabilities
x1
x2 .34 92.0 USA Policies, Decision Rules,
Summaries

Human Decision
xn .2 85.2 Canada
Making

Mark Crowley A to Z of AI/ML Sep 23, 2017 9 / 112


What is AI? Landscape of Big Data/AI/ML

Data, Big Data, Machine Learning, AI, etc, etc,

Vision
Nat Lang Robotics
Processing
LSTM
Game
RNN A3C
CNN DQN Theory
Multi-
Deep Reinforcement agent
Probabilistic Learning Learning systems
Programming
ILP
Big Data Machine Artificial
Tools Learning Intelligence Constraint
Programming

SAT

SMP

Data Analysis Classification, Automated


Data
Data points

Patterns, Decision Making


Reports, statistics,
X f1 f2 … fk Predictions
Charts, trends
.5 104.2 China Probabilities
x1
x2 .34 92.0 USA Policies, Decision Rules,
Summaries

Human Decision
xn .2 85.2 Canada
Making

features

Mark Crowley A to Z of AI/ML Sep 23, 2017 10 / 112


What is AI? Landscape of Big Data/AI/ML

Major Types/Areas of AI

Artificial Intellgience: some algorithm to enable computers to perform


actions we define as requireing intelligence.

Mark Crowley A to Z of AI/ML Sep 23, 2017 11 / 112


What is AI? Landscape of Big Data/AI/ML

Major Types/Areas of AI

Artificial Intellgience: some algorithm to enable computers to perform


actions we define as requireing intelligence. This is a moving target.

Mark Crowley A to Z of AI/ML Sep 23, 2017 11 / 112


What is AI? Landscape of Big Data/AI/ML

Major Types/Areas of AI

Artificial Intellgience: some algorithm to enable computers to perform


actions we define as requireing intelligence. This is a moving target.
Search Based Heuristic Optimization (A*)
Evolutionary computation (genetic algorithms)
Logic Programming (inductive logic programming, fuzzy logic)
Probabilistic Reasoning Under Uncertainty (bayesian networks)
Computer Vision
Natural Language Processing
Robotics
Machine Learning

Mark Crowley A to Z of AI/ML Sep 23, 2017 11 / 112


What is AI? Landscape of Big Data/AI/ML

Types of Machines Learning


Machine Learning: ”Detect patterns in data, use the uncovered patterns
to predict future data or other outcomes of interest” – Kevin Murphy,
Google Research.

Mark Crowley A to Z of AI/ML Sep 23, 2017 12 / 112


What is AI? Landscape of Big Data/AI/ML

Deep Learning

Deep Learning: methods which perform machine learning through the use
of multilayer neural networks of some kind. Deep Learning can be applied
in any of the three main types of ML:
Supervised Learning : very common, enourmous improvement in
recent years
Unsupervised Learning : just beginning, lots of potential
Reinforement Learning : recent (past 3 years) this has exploded,
exspecially for video games

Mark Crowley A to Z of AI/ML Sep 23, 2017 14 / 112


What is AI? Landscape of Big Data/AI/ML

Increasing Complexity of Supervised ML Methods


1 mean, mode, max, min - basic statistics and patterns
2 prediction/regression - least squares, ridge regression
3 linear classification - use distances and separation of data points.
(logistic regression, SVM, KNN)
4 Kernel Based Classification - define a mapping from original data to a
new space, allow nonlinear divisions to be found
5 Decision trees - learn rules that divide data arbitrarily (C4.5, Random
Forests, AdaBoost)
6 Neural Networks - learn function using ’neurons’
7 Deep Neural Networks - same, but deep :)
8 Recurrent Neural Networks - adding links to past timesteps, learning
with memory of the past
9 Convolutional Neural Networks - adding convolutional filters, good for
images
10 Inception Resnets, Long-Term Short-Term Networks, Voxception
Networks, .... oh it keeps going...
Mark Crowley A to Z of AI/ML Sep 23, 2017 15 / 112
What is AI? Classification

One Example of ML: Classification

Mark Crowley A to Z of AI/ML Sep 23, 2017 16 / 112


What is AI? Classification

Clustering vs. Classification

Clustering Classification
Unsupervised Supervised
Uses unlabeled data Uses labeled data
Organize patterns w.r.t. an Requires training phase
optimization criteria Domain sensitive
Requires a definition of similarity Easy to evaluate (you know the
Hard to evaluate correct answer)
Examples: K-means, Fuzzy Examples: Naive Bayes, KNN,
C-means, Hierarchical SVM, Decision Trees, Random
Clustering, DBScan Forests

Mark Crowley A to Z of AI/ML Sep 23, 2017 18 / 112


What is AI? Classification

Classification Performance Depends on the Algorithm

A good example of this choices is Support Vector Machines (SVMs).


popular until dawn of deep learning in past few years
core idea: find a dividing hyperplane
many variations: plane can be linear, polynomial, gaussian,
high-dimensional

Mark Crowley A to Z of AI/ML Sep 23, 2017 20 / 112


What is AI? Classification

Classification Performance Depends on the Algorithm

A good example of this choices is Support Vector Machines (SVMs).


popular until dawn of deep learning in past few years
core idea: find a dividing hyperplane
many variations: plane can be linear, polynomial, gaussian,
high-dimensional

So what is the “right” approach?

Mark Crowley A to Z of AI/ML Sep 23, 2017 20 / 112


What is AI? Classification

Classification Performance Depends on the Algorithm

A good example of this choices is Support Vector Machines (SVMs).


popular until dawn of deep learning in past few years
core idea: find a dividing hyperplane
many variations: plane can be linear, polynomial, gaussian,
high-dimensional

So what is the “right” approach? Experimentation!

Mark Crowley A to Z of AI/ML Sep 23, 2017 20 / 112


What is AI? Classification

Classification Performance Depends on the Algorithm

So choose carefully...
See http://scikit-learn.org/stable/auto_examples/
classification/plot_classifier_comparison.html

Mark Crowley A to Z of AI/ML Sep 23, 2017 21 / 112


Neural Networks Building Upon Classic Machine Learning

Outline

Introduction

What is AI?

Neural Networks
Building Upon Classic Machine Learning
History Of Neural Networks
Improving Performance

Convolutional Neural Networks

Do you need AI/ML?

Mark Crowley A to Z of AI/ML Sep 23, 2017 22 / 112


Neural Networks Building Upon Classic Machine Learning

Linear Regression vs. Logistic Regression

A simple type of Generalized Linear Model


Linear regression learns a function to predict a continuous variable
output of continous or discrete input variables
X
Y = b0 + (bi Xi ) + 

Logistic regression predicts the probability of an outcome, the


appropriate class for an input vector or the odds of one outcome
being more likely than another.

Mark Crowley A to Z of AI/ML Sep 23, 2017 23 / 112


Neural Networks Building Upon Classic Machine Learning

Logistic Regression as a Graphical Model

X 1
o(x) = σ(w T xi ) = σ(w0 + wi xi ) = P
1 + exp(−(w0 + i wi xi ))
i

Mark Crowley A to Z of AI/ML Sep 23, 2017 25 / 112


Neural Networks Building Upon Classic Machine Learning

Logistic Regression Used as a Classifier

Logistic Regression can be used as a simple linear classifier.


Compare probabilities of each class P(Y = 0|X ) and P(Y = 1|X ).
Treat the halfway point on the sigmoid as the decision boundary.

P(Y = 1|X ) > 0.5 classify X in class 1


X
w0 + wi xi = 0
i

Mark Crowley A to Z of AI/ML Sep 23, 2017 26 / 112


Neural Networks Building Upon Classic Machine Learning

Training Logistic Regression Model via Gradient Descent

Can’t easily perform Maximum Likelihood Estimation


The negative log-likehood of the logistic function is given by NLL and
it’s gradient by g

N
!
X X
NLL(w ) = log 1 + exp(−(w0 + wi xi ))
i=1 i
∂ X
g= = (σ(w T xi ) − yi )xi
∂w
i

Then we can update the parameters iteratively

θk+1 = θk − ηk gk

where ηk is the learning rate or step size.


Mark Crowley A to Z of AI/ML Sep 23, 2017 27 / 112
Neural Networks History Of Neural Networks

Neural Networks to learn f : X → Y


f can be a non-linear function
X (vector of) continuous and/or discrete variables
Y (vector of) continuous and/or discrete variables
Feedforward Neural networks - Represent f by network of non-linear
(logistic/sigmoid/ReLU) units:

Nonlinear Unit
Sigmoid/ReLU/ELU Output layer, Y

Hidden layer, H

Input layer, X

Mark Crowley A to Z of AI/ML Sep 23, 2017 37 / 112


Neural Networks History Of Neural Networks

Basic Three Layer Neural Network

Input Layer
vector data, each input collects one feature/dimension of the data
and passes it on to the (first) hidden layer.
Hidden Layer
Each hidden unit computes a weighted sum of all the units from the
input layer (or any previous layer) and passes it through a nonlinear
activation function.
Output Layer
Each output unit computes a weighted sum of all the hidden units
and passes it through a (possibly nonlinear) threshold function.

Mark Crowley A to Z of AI/ML Sep 23, 2017 38 / 112


Neural Networks History Of Neural Networks

Properties of Neural Networks

Universality: Given a large enough layer of hidden units (or multiple


layers) a NN can represent any function.
Representation Learning: classic statistical machine learning is about
learning functions to map input data to output. But Neural Networks,
and especially Deep Learning, are more about learning a
representation in order to perform classification or some other task.

Mark Crowley A to Z of AI/ML Sep 23, 2017 40 / 112


Neural Networks History Of Neural Networks

Hidden Layer: Adding Nonlinearity

Each hidden unit emits an output that is a nonlinear activation


function of its net activiation.

yj = f (netj )

This is essential to neural networks power, if it’s linear then it all


becomes just linear regression.
The output is thus thresholded through this nonlinear activation
function.

Mark Crowley A to Z of AI/ML Sep 23, 2017 44 / 112


Neural Networks History Of Neural Networks

Activation Functions

tanh was another common function.


sigmoid is now discourage except for final layer to obtain
probabilities. Can over-saturate easily.
ReLU is the new standard activation function to use.
Mark Crowley A to Z of AI/ML Sep 23, 2017 45 / 112
Neural Networks History Of Neural Networks

Rectified Linear Activation

g(z) = max{0, z}

0 (Goodfellow 2016)

Figure 6.3: The rectified linear activation function. This activation function is the default
activation function recommended for use with most feedforward neural networks. Applying
Rectified Linear Units (ReLU) have become standard max(0, netj )
this function to the output of a linear transformation yields a nonlinear transformation.
However, the function remains very close to linear, in the sense that is a piecewise linear
strong signals are alwasy easy to distinguish
function with two linear pieces. Because rectified linear units are nearly linear, they
mostmany
preserve valuesof theare zero,that
properties deritive is mostly
make linear zero
models easy to optimize with gradient-
based methods. They also preserve many of the properties that make linear models
they do not saturate as easily as sigmoid
generalize well. A common principle throughout computer science is that we can build
complicated systems from minimal components. Much as a Turing machine’s memory
new Exponential linear units - evidence that they perform better than
needs only to be able to store 0 or 1 states, we can build a universal function approximator
from rectified linear functions.
ReLU in some situations.
Mark Crowley A to Z of AI/ML Sep 23, 2017 46 / 112
Neural Networks History Of Neural Networks

Gradient Descent

E – Error function
MSE, cross-entro
loss
Error Function: Mean Squared Error, cross-entropy loss, etc.
d

For Neural Networks


E[w] no longer conve
(Slides from Tom Mitchell ML Course, CMU, 2010)
Mark Crowley A to Z of AI/ML Sep 23, 2017 53 / 112
Neural Networks History Of Neural Networks

Gradient Descent

E – Error function
MSE, cross-entro
loss
Error Function: Mean
h Squared Error, cross-entropy
i loss, etc.
∂E ∂E ∂E
Gradient: 5E [w] = ∂w0 , ∂w1 , . . . , ∂wd d

∂E
Training Update Rule: ∆wi = −η ∂w i
where η is the training rate.
Note: For regression, others, this gradient is convex. In ANNs it is not. So
we must solve iteratively For Neural Networks
E[w] no longer conve
(Slides from Tom Mitchell ML Course, CMU, 2010)
Mark Crowley A to Z of AI/ML Sep 23, 2017 53 / 112
Neural Networks History Of Neural Networks

Incremental Gradient Descent

Let error function be : El [w] = 12 (y l − o l )2


Do until satisfied:
For each training example l in D
1 Compute the gradient 5E [w]
2 update weights : w = w − η 5 E [w]
Note: can also use batch gradient descent on many points at once.

Mark Crowley A to Z of AI/ML Sep 23, 2017 54 / 112


Neural Networks History Of Neural Networks

Backpropagation Algorithm
We need an iterative algorithm for getting the gradient efficiently.
For each training example:
1 Forward propagation: Input the training example to the network and

compute outputs
2 Compute output units errors:

δkl = okl (1 − okl )(ykl − okl )

3 Compute hidden units errors:


X
δhl = ohl (1 − ohl ) wh,k δkl
k

4 Update network weights:


l
wi,j = wi,j + ∆wi,j = wi,j + ηδjl oil

Mark Crowley A to Z of AI/ML Sep 23, 2017 56 / 112


Neural Networks History Of Neural Networks

A Short History

40’s Early work in NN goes back to the 40s with a simple model
of the neuron by McCulloh and Pitt as a summing and
thresholding devices.
1958 Rosenblatt in 1958 introduced the Perceptron,a two layer
network (one input layer and one output node with a bias in
addition to the input features.
1969 Marvin Minsky: 1969. Perceptrons are ’just’ linear, AI goes
logical, beginning of ”AI Winter”
1980s Neural Network resurgence: Backpropagation (updating
weights by gradient descent)
1990s SVMs! Kernals can do anything! (no, they can’t)

Mark Crowley A to Z of AI/ML Sep 23, 2017 32 / 112


Neural Networks History Of Neural Networks

A Short History
1993 LeNet 1 for digit recognition
2003 Deep Learning (Convolutional Nets Dropout/RBMs, Deep
Belief Networks)
1986, 2006 Restricted Boltzman Machines
2006 Neural Network outperform RBF SVM on MNIST
handwriting dataset (Hinton et al.)
2012 AlexNet for ImageNet challenge - this algorithm beat
competition by error rate of 16% vs 26% for next best
ImageNet : contains 15 million annotated images in
over 22,000 categories.
ZFNet paper (2013) extends this and has good
description of network structure
2012-present Google Cat Youtube, speech recognition, self driving cars,
computer defeats regional Go champion, ...
2014 GoogLeNet added many layers and introduced inception
modules (allows parallel computation rather than serially
Mark Crowley A to Z of AI/ML Sep 23, 2017 33 / 112
Neural Networks History Of Neural Networks

A Short History

2014 Generative Adversarial Networks (GANs) introduced.


2015 Microsoft algorithm beats human performance at ImageNet
challenge.
2016 AlphaGo defeats one of best world players of Go Lee Sedol
using Deep Reinforcement Learning.
2016 Deep Mind introduces A3C Deep RL algorithm that can
learn to play Atari games from images by playing with no
instructions.

Mark Crowley A to Z of AI/ML Sep 23, 2017 34 / 112


Neural Networks History Of Neural Networks

Outline

Introduction

What is AI?

Neural Networks
Building Upon Classic Machine Learning
History Of Neural Networks
Improving Performance

Convolutional Neural Networks

Do you need AI/ML?

Mark Crowley A to Z of AI/ML Sep 23, 2017 58 / 112


Neural Networks History Of Neural Networks

Problems with ANNs

Overfitting
Very inneficient for images, timeseries, large numbers of
inputs-outputs
Slow to train
Hard to interpret the resulting model
Overfitting

Mark Crowley A to Z of AI/ML Sep 23, 2017 59 / 112


Neural Networks Improving Performance

Heuristics for Improving Backpropagation

There are a number of useful heuristics for training Neural Networks that
are useful in practice (maybe we’ll learn more today):
Less hidden nodes, just enough complexity to work, not too much to
overfit.
Train multiple networks with different sizes and search for the best
design.
Validation set: train on training set until error on validation set starts
to rise, then evaluate on evaluation set.
Try different activiation functions: tanh, ReLU, ELU, ...?
Dropout (Hinton 2014) - randomly ignore certain units during
training, don’t update them via gradient descent, leads to hidden
units that specialize
Modify learning rate over time (cooling schedule)

Mark Crowley A to Z of AI/ML Sep 23, 2017 60 / 112


Neural Networks Improving Performance

Dropout
Dropout (Hinton 2014) - randomly ignore certain units during
training, don’t update them via gradient descent, leads to hidden
units that specialize.
With probability p don’t include a weight in the gradient updates.
Reduces overfitting by encouraging robustness of weights in the
network.

Mark Crowley A to Z of AI/ML Sep 23, 2017 64 / 112


Neural Networks Improving Performance
CHAPTER 6. DEEP FEEDFORWARD NETWORKS

Large, Shallow Models Overfit More

97
3, convolutional
Test accuracy (percent)

96
3, fully connected
95 11, convolutional
94

93

92

91
0.0 0.2 0.4 0.6 0.8 1.0
Number of parameters ⇥108
(Goodfellow 2016)

Figure 6.7: Deeper models tend to perform better. This is not merely because the model is
larger. This experiment from Goodfellow et al. (2014d) shows that increasing the number
of parameters in layers of convolutional networks without increasing their depth is not
nearly as effective at increasing test set performance. The legend indicates the depth of
network used to make each curve and whether the curve represents variation in the size of
the convolutional or the fully connected layers. We observe that shallow models in this
context overfit at around 20 million parameters while deep ones can benefit from having
over 60 million. This suggests that using a deep model expresses a useful preference over
the space of functions the model can
Mark Crowley learn.
A to Specifically, it expresses a belief that
Z of AI/ML Septhe
23, 2017 67 / 112
Convolutional Neural Networks

Outline

Introduction

What is AI?

Neural Networks

Convolutional Neural Networks


Motivation
Other Types of Deep Neural Networks

Do you need AI/ML?

Mark Crowley A to Z of AI/ML Sep 23, 2017 68 / 112


Convolutional Neural Networks Motivation

Convolutional Network Structure

input data: image (eg. 256x256 pixels x3 channels RGB)


output : categorical label

Mark Crowley A to Z of AI/ML Sep 23, 2017 70 / 112


Convolutional Neural Networks Motivation

Example Applications of CNNs

(Karpathy Blog, Oct, 25, 2015 - http:karpathy.github.io20151025selfie)

Mark Crowley A to Z of AI/ML Sep 23, 2017 71 / 112


Convolutional Neural Networks Motivation

Parameter sharing
CHAPTER 9. CONVOLUTIONAL NETWORKS

Convolution s1 s2 s3 s4 s5
shares the same
parameters
across all spatial x1 x2 x3 x4 x5

locations
Traditional s1 s2 s3 s4 s5
matrix
multiplication
does not share x1 x2 x3 x4 x5

any parameters
Figure 9.5: Parameter sharing: Black arrows indicate the connections that use a
(Goodfellow 2016)

parameter in two different models. (Top)The black arrows indicate uses of


element
Mark Crowley of a 3-element kernel
A to in
Z ofaAI/ML
convolutional model. Due to2017
Sep 23, parameter 73 / 112sh
Convolutional Neural Networks Motivation

2D Convolution

Input
Kernel / Tensor
a b c d
w x
e f g h
y z
i j k l

Output

aw + bx + bw + cx + cw + dx +
ey + fz fy + gz gy + hz

ew + fx + fw + gx + gw + hx +
iy + jz jy + kz ky + lz

Figure 9.1: An example of 2-D convolution without kernel-flipping. In this case we restrict
(Goodfellow 2016)

the output to only positions where the kernel lies entirely within the image, called “valid”
convolution
Mark Crowley in some contexts. We drawA toboxes with arrows to indicate how the Sep
Z of AI/ML upper-left
23, 2017 76 / 112
Convolutional Neural Networks Motivation

A simpleCHAPTER
example9. CONVOLUTIONAL NETWORKS

Edge Detection by Convolution


Edge detection by convolution with a kernal that subtracts the value from
the neighbouring pixel on the left for every pixel.

Input
ure 9.6: Efficiency of edge detection. The image on the right was formed by taking
pixel in the original image and subtracting the value of its neighboring pixel on the
This shows the strength of all of the vertically oriented edges in the input image,
ch can be a useful operation for object detection. Both images are 280 pixels tall.
input image is Figure -1
320 pixels -1 while the output image is 319 pixels
9.6: wide
Efficiency of edge detection. The image on the
Output
wide. This
right was formed by ta
sformation can be described by a convolution kernel containing two
each pixel in the original image and subtracting the value elements, and
of its neighboring pixel on
uires 319 ⇥ 280 ⇥left. Kernel
3 = 960 floating
267,shows
This point of
the strength operations
all of the (two multiplications
vertically and in the input im
oriented edges
(Goodfellow 2016)
addition per output
whichpixel)
can betoacompute using convolution.
useful operation To describe
for object detection. theimages
Both same are 280 pixels
Mark Crowley A to Z of AI/ML Sep 23, 2017 77 / 112
Convolutional Neural Networks Motivation

Other CNN Modification

Pooling: Nearby pixels tend to represent the same thing/class/object.


So, pool responses from nearby nodes. (eg. mean, median,
min, max)
Strides: number of pixels overlap between adjacent filters
Zero padding: removing edge pixels from filter scan, can reduce size of
network and deal with edge effects
Connectivity: Alternate local connectivity options, partial connectivity

Mark Crowley A to Z of AI/ML Sep 23, 2017 90 / 112


Convolutional Neural Networks Other Types of Deep Neural Networks

Other Types of Deep Neural Networks

RBM: Restricted Boltzman Machines (RBM) - older directed deep


model.
RNN: Recurrent Neural Networks (RNN) - allow links from outputs
back to inputs, over time, good for time series learning
LSTM: Long-Term Short-Term networks - more complex form of
RNN
DeepRL: Deep Reinforcement Learning
GAN: General Adversarial Networks - train two networks at once

Mark Crowley A to Z of AI/ML Sep 23, 2017 99 / 112


Convolutional Neural Networks Other Types of Deep Neural Networks

Other Types of Deep Neural Networks

RBM: Restricted Boltzman Machines (RBM) - older directed deep


model.
RNN: Recurrent Neural Networks (RNN) - allow links from outputs
back to inputs, over time, good for time series learning
LSTM: Long-Term Short-Term networks - more complex form of
RNN
integrate strategically remembered particular
information from the past
formalizes a process for forgetting information over
time.
useful if you need to learn patterns over time and your
data feautres
DeepRL: Deep Reinforcement Learning
GAN: General Adversarial Networks - train two networks at once
Mark Crowley A to Z of AI/ML Sep 23, 2017 99 / 112
Convolutional Neural Networks Other Types of Deep Neural Networks

Other Types of Deep Neural Networks

RBM: Restricted Boltzman Machines (RBM) - older directed deep


model.
RNN: Recurrent Neural Networks (RNN) - allow links from outputs
back to inputs, over time, good for time series learning
LSTM: Long-Term Short-Term networks - more complex form of
RNN
DeepRL: Deep Reinforcement Learning
CNNs + Fully Connected Deep Network for learning a
representation of a policy
Reinforcement Learning for updating the policy through
experience to make improved decision decisions
Requires a value/reward function
GAN: General Adversarial Networks - train two networks at once

Mark Crowley A to Z of AI/ML Sep 23, 2017 99 / 112


Convolutional Neural Networks Other Types of Deep Neural Networks

Other Types of Deep Neural Networks

RBM: Restricted Boltzman Machines (RBM) - older directed deep


model.
RNN: Recurrent Neural Networks (RNN) - allow links from outputs
back to inputs, over time, good for time series learning
LSTM: Long-Term Short-Term networks - more complex form of
RNN
DeepRL: Deep Reinforcement Learning
GAN: General Adversarial Networks - train two networks at once

Mark Crowley A to Z of AI/ML Sep 23, 2017 99 / 112


Convolutional Neural Networks Other Types of Deep Neural Networks

General Adversarial Networks

One network produces/hallucinates new answers (generative)


Second network distinguishes between the real and the generated
answers (adversary/critic)
Blog withCode: ”GANS in 50 lines of code PyTorch code.” easy way
to get started
Mark Crowley A to Z of AI/ML Sep 23, 2017 100 / 112
Do you need AI/ML?

Outline

Introduction

What is AI?

Neural Networks

Convolutional Neural Networks

Do you need AI/ML?


Defining Your Questions
Designing Your AI/ML System
Languages and Libraries
Deep Learning Frameworks
Compute Resources
Mark Crowley A to Z of AI/ML Sep 23, 2017 101 / 112
Do you need AI/ML? Defining Your Questions

Defining Your Questions


Is it a decision to be made?
Is there a pattern to detect?
Do you have data?
What kinds of questions do you have about the data?
Yes/No questions - Did X happen? Are A and B correlated?
Timing - When did X happen?
Anomaly detection - Is X strange/abnormal/unexpected?
Classification - What kind of Y is X?
Prediction - Weve seen lots of (X,Y) now we want to know (X’,?)
Do you have labels?
Can you give the right answer for some portion of the data?
Collecting labels: Automatic? Manual? Crowd-sourced? (eg. Amazon
Mechanical Turk) Y
Yes → Supervised Learning - Lots of options
No → Unsupervised Learning - Some options (getting better all the
time)
Mark Crowley A to Z of AI/ML Sep 23, 2017 102 / 112
Do you need AI/ML? Defining Your Questions

Answers and Constraints

What kind of answer do you need? (increasing difficulty)


Find patterns which are present in the data and view them
Most likely explanation for a pattern
Probability of (fact about X,A,B...) being true
A policy for actions to take in the future to maximize benefit
Guarantees that X will (or will not) happen (very hard)
How big is your data?
Is it static?
MB, GB, TB?
Is it streaming?
KB/sec, MB/sec
How many data points/rows/events will there be?

Mark Crowley A to Z of AI/ML Sep 23, 2017 103 / 112


Do you need AI/ML? Designing Your AI/ML System

How to Design your AI/ML Question

Define your task:


Prediction, Clustering, Classification,
Anomaly Detection?
Define objectives, error metrics,
performance standards
Collect Data:
Set up data stream (storage, input
flow, parallelization, Hadoop)
Preprocessing:
Noise/Outlier Filtering
Completing missing data (histograms,
interpolation)
Normalization (scaling data)

Mark Crowley A to Z of AI/ML Sep 23, 2017 104 / 112


Do you need AI/ML? Designing Your AI/ML System

How to Design your AI/ML Question


Dimensionality Reduction / Feature
Selection:
Choose features to use/extract
from data
PCA/LDA/LLE/GDA
Choose Algorithm:
Consider goals, questions
Tractability
Experimental Design:
train/validate/test data sets
cross-validation
Run it! :
Deployment
Maintenance
Mark Crowley A to Z of AI/ML Sep 23, 2017 105 / 112
Do you need AI/ML? Languages and Libraries

Language Choices
Any language can be used for implementing/using AI/ML algorithms, but
some make it much easier
C++: you can do it, may need to implement many things yourself
Java: many of libraries for ML (Weka is a good open source one,
Deeplearning4j)
Scala: leaner, functional language that compile to JVM bytecode,
good for prototyping, can reuse libraries for Java
(Deeplearning4j)
R: focussed on statisical methods, more and more machine
learning libraries implemented for this
Matlab: good for all the calculations, if you have the right libraries
it’s great (not cheap or very portable beyond school)
Python: most commonly used right now for deep learning, we’re
gonna need another slide ...
Mark Crowley A to Z of AI/ML Sep 23, 2017 106 / 112
Do you need AI/ML? Languages and Libraries

Python

numpy - numerical libraries, implementation of matrix and linear


algerbra datastructures, graphing tools
pandas - table datastructure, statistical analysis tools (implements
many useful features from R)
scipy - includes all of the above and more, full installation of
scientific libraries, basically turns Python into R+Matlab
scikit-learn - many standard machine learning algorithms implemented as
easy-to-use Python APIs
jupyter notebooks - these are powerful web-based interfaces to python for
data analysis and machine learning.

Mark Crowley A to Z of AI/ML Sep 23, 2017 107 / 112


Do you need AI/ML? Deep Learning Frameworks

Deep Learning Frameworks

Caffe - older, easy to set up mockups, harder to install?


Theano - made out of University of Montreal, great theoretical setup,
very flexible, python only
Tensorflow - made by Google, scales to many GPUs, servers, lots of
optimization, requires planning of the whole system
beforehand, most languages
PyTorch - easier to mock things up, try different designs, not as
optimized for large scale performance as tensorflow
MXNet - made by Microsoft, supports most languages and OS’s
Deeplearning4j - Java focussed framework
Keras - open interface to create models in multiple frameworks
(tensorflow, theano, MXNet)

Mark Crowley A to Z of AI/ML Sep 23, 2017 108 / 112


Do you need AI/ML? Compute Resources

Cloud Services

There are several powerful, free services you can access via a student
account which you can request directly.
AWS: Amazon Web Service - very large, has accessible APIs to
connect to, many options for hardware to run on (but the
best ones will cost extra)
Azure: Microsoft - lots of visual tools for composing AI/ML
components.
Google Cloud ML Engine: - uses all the latest tools and tensorflow models
None of these provide GPU servers for free, that will cost
extra. (It will still work, just be slower for deep learning.)

Mark Crowley A to Z of AI/ML Sep 23, 2017 109 / 112


Do you need AI/ML? Compute Resources

Summary

Introduction

What is AI?
Landscape of Big Data/AI/ML
Classification

Neural Networks
Building Upon Classic Machine Learning
History Of Neural Networks
Improving Performance

Convolutional Neural Networks


Motivation
Other Types of Deep Neural Networks

Do you need AI/ML?


Defining Your Questions
Designing Your AI/ML System
Languages and Libraries
Deep Learning Frameworks
Compute Resources

Mark Crowley A to Z of AI/ML Sep 23, 2017 110 / 112


Do you need AI/ML? Compute Resources

Useful Books

A book for of three eras of Machine Learning:


[Goodfellow, 2016]
Goodfellow, Bengio and Courville. “Deep Learning”, MIT Press, 2016.
http://www.deeplearningbook.org/
Website has free copy of book as pdf’s.
[Murphy, 2012]
Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT
Press, 2012.
[Duda, Pattern Classification, 2001]
R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification (2nd
ed.), John Wiley and Sons, 2001.

Mark Crowley A to Z of AI/ML Sep 23, 2017 111 / 112


Do you need AI/ML? Compute Resources

Useful Papers and Blogs


[lecun2015]
Y. LeCun, Y. Bengio, G. Hinton, L. Y., B. Y., and H. G., “Deep
learning”, Nature, vol. 521, no. 7553, pp. 436444, 2015. Great
references at back with comments on seminal papers.
[bengio2009]
Y. Bengio, “Learning Deep Architectures for AI”, Foundations and
Trends in Machine Learning, vol. 2, no. 1. 2009. An earlier general
referenceon the fundamentals of Deep Learning.
[krizhevsky2012]
A. Krizhevsky, G. E. Hinton, and I. Sutskever,“ImageNet Classification
with Deep Convolutional Neural Networks”, Adv. Neural Inf. Process.
Syst. pp. 19, 2012. The beginning of the current craze.
[Karpathy, 2015]
Andrej Karpathy’s Blog - http://karpathy.github.io Easy to
follow explanations with code.
Mark Crowley A to Z of AI/ML Sep 23, 2017 112 / 112

You might also like