CSM 6405: Symbolic ML II
Lecture 1: Introduction
Pro f. Dr. M d . R a k i b Ha s s an
De pt . o f Co m p u te r S c i en ce a n d M at h e m at ics ,
Ba n gl adesh A gr i c ul tural U n i ve rsi ty.
E m a i l: ra k i b@ bau .[Link]
Books
❖ Machine Learning
❑ Tom M Mitchell
❖ Artificial Intelligence: A Modern Approach
❑ Stuart Russell and Peter Norvig
❖ The Hundred-Page Machine Learning Book
❑ Andriy Burkov
❖ Hands-On Machine Learning with Scikit-Learn and TensorFlow:
Concepts, Tools, and Techniques to Build Intelligent Systems
❑ Aurélien Géron
❖ Pattern Recognition and Machine Learning (Information
Science and Statistics)
❑ Christopher M. Bishop
❖ Deep Learning
❑ Ian Goodfellow and Yoshua Bengio
PROFESSOR DR. MD. RAKIB HASSAN 2
Books (Cont.)
❖ Data Science from Scratch
❑ Joel Grus
❖ Deep Learning with Python
❑ Francois Chollet
❖ Machine Learning: An Algorithmic Perspective
❑ Stephen Marsland
❖ Python Machine Learning
❑ Sebastian Raschka and Vahid Mirjalili
❖ Learning from Data
❑ Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien
Lin
PROFESSOR DR. MD. RAKIB HASSAN 3
Resources
❖ Machine Learning by Andrew Ng (Coursera)
❑ [Link]
❖ Geoffrey Hinton’s Neural Network and Deep Learning
❑ [Link]
❖ Scikit-Learn’s user guide
❑ [Link]
❖ Dataquest interactive tutorials
❑ [Link]
❖ Deep learning
❑ [Link]
❖ Competitions
❑ [Link]
PROFESSOR DR. MD. RAKIB HASSAN 4
Introduction
❖ Machine learning (ML) teaches computers to do what
comes naturally to humans and animals:
❑ learn from experience/data
❖ Machine learning algorithms use computational
methods to “learn” information directly from data
without relying on a predetermined equation as a
model.
❖ The algorithms adaptively improve their performance
as the number of samples available for learning
increases.
PROFESSOR DR. MD. RAKIB HASSAN 5
Applications of ML
❖ Image processing and computer vision, for face
recognition, motion detection, and object detection
❖ Computational biology, for tumor detection, drug
discovery, and DNA sequencing
❖ Detecting transactions that are likely to be fraudulent
❖ Energy production, for price and load forecasting
❖ Understanding human learning (brain, real AI)
PROFESSOR DR. MD. RAKIB HASSAN 6
Applications of ML (Cont.)
❖ Ranking web search results
❖ Recognizing faces
❖ Smartphone’s speech recognition
❖ Song or movie recommendations
❖ Self-driving cars
❖ Autonomous weapons
❖ Beating humans in games (e.g., Go)
PROFESSOR DR. MD. RAKIB HASSAN 7
Game of Go
❖ Originated in China 3,000 years ago.
❖ The rules of the game are simple but it is a game of
profound complexity
❑ 10170 possible board configurations - more than the number of
atoms in the known universe - making Go a googol (10100)
times more complex than Chess.
❖ AlphaGo
❑ [Link]
PROFESSOR DR. MD. RAKIB HASSAN 8
DeepMind
❖ DeepMind Technologies is a British artificial
intelligence company founded in September 2010,
acquired by Google in 2014 and currently owned by
Alphabet Inc.
❖ A more general program, AlphaZero, beat the most
powerful programs playing go, chess and shogi
(Japanese chess) after a few days of play against itself
using reinforcement learning.
PROFESSOR DR. MD. RAKIB HASSAN 9
Quotes about ML
❖ Professor Stephen Hawking:
❑ “The development of full artificial intelligence could spell the
end of the human race.”
❖ Billionaire Elon Musk has said that he thinks AI is the
“biggest existential threat” to the human race.
PROFESSOR DR. MD. RAKIB HASSAN 10
Machine Learning (ML) Definition
❖ Machine Learning: Field of study that gives
computers the ability to learn without being explicitly
programmed.
❑ Arthur Samuel (1959)
❖ Well-posed Learning Problem: A computer program is
said to learn from experience E with respect to some
task T and some performance measure P, if its
performance on T, as measured by P, improves with
experience E.
❑ Tom Mitchell (1998)
PROFESSOR DR. MD. RAKIB HASSAN 11
Example of ML
❖ Definition of ML: “A computer program is said to
learn from experience E with respect to some task T
and some performance measure P, if its performance
on T, as measured by P, improves with experience E.”
❖ Example: Suppose your email program watches which
emails you do or do not mark as spam, and based on
that learns how to better filter spam. What is the task
T in this setting?
PROFESSOR DR. MD. RAKIB HASSAN 12
Example of ML (Cont.)
❖ Example: Suppose your email program watches which
emails you do or do not mark as spam, and based on
that learns how to better filter spam. What is the task
T in this setting?
❑ Classifying emails as spam or not spam (ham).
❑ Watching you label emails as spam or not spam.
❑ The number (or fraction) of emails correctly classified as
spam/not spam.
❑ None of the above—this is not a machine learning problem.
PROFESSOR DR. MD. RAKIB HASSAN 13
Example of ML (Cont.)
❖ Example: Suppose your email program watches which
emails you do or do not mark as spam, and based on
that learns how to better filter spam. What is the task
T in this setting?
❑ Classifying emails as spam or not spam. (T)
❑ Watching you label emails as spam or not spam. (E)
❑ The number (or fraction) of emails correctly classified as
spam/not spam. (P)
❑ None of the above—this is not a machine learning problem.
PROFESSOR DR. MD. RAKIB HASSAN 14
Spam Classification
❖ Traditional approach
❖ Machine learning approach
PROFESSOR DR. MD. RAKIB HASSAN 15
Traditional Approach
❖ In this approach, the program will likely become a
long list of complex rules - pretty hard to maintain.
PROFESSOR DR. MD. RAKIB HASSAN 16
Traditional Approach (Cont.)
❖ If spammers notice that all their emails containing
“4U” are blocked, they might start writing “For U”
instead.
❖ A spam filter using traditional programming
techniques would need to be updated to flag “For U”
emails.
❖ If spammers keep working around your spam filter,
you will need to keep writing new rules forever.
PROFESSOR DR. MD. RAKIB HASSAN 17
Machine Learning Approach
❖ Automatically learns which words and phrases are good predictors
of spam by detecting unusually frequent patterns of words in the
spam examples compared to the ham examples
❖ The program is much shorter, easier to maintain, and most likely
more accurate.
PROFESSOR DR. MD. RAKIB HASSAN 18
Machine Learning Approach (Cont.)
❖ A spam filter based on Machine Learning techniques
automatically notices that “For U” has become
unusually frequent in spam flagged by users, and it
starts flagging them without your intervention.
PROFESSOR DR. MD. RAKIB HASSAN 19
Why Machine Learning?
❖ Machine Learning is great for:
❑ Problems for which existing solutions require a lot of hand-
tuning or long lists of rules: one Machine Learning algorithm
can often simplify code and perform better.
❑ Complex problems for which there is no good solution at all
using a traditional approach: the best Machine Learning
techniques can find a solution.
❑ Fluctuating environments: a Machine Learning system can
adapt to new data.
❑ Getting insights about complex problems and large amounts
of data.
PROFESSOR DR. MD. RAKIB HASSAN 20
Why Machine Learning? (Cont.)
❖ Machine Learning can also help humans learn.
PROFESSOR DR. MD. RAKIB HASSAN 21
Types of Machine Learning Systems
❖ Whether or not they are trained with human supervision
❑ supervised, unsupervised, semi-supervised, and Reinforcement Learning
❖ Whether or not they can learn incrementally on the fly
❑ online versus batch learning
❖ Whether they work by simply comparing new data points to
known data points, or instead detect patterns in the training
data and build a predictive model, much like scientists do
❑ instance-based versus model-based learning
❖ These are not exclusive. They can be combined in different
ways.
PROFESSOR DR. MD. RAKIB HASSAN 22
Supervised/Unsupervised Learning
❖ Machine Learning systems can be classified according
to the amount and type of supervision they get
during training.
❖ There are four major categories:
❑ Supervised learning
❑ Unsupervised learning
❑ Semi-supervised learning and
❑ Reinforcement Learning
PROFESSOR DR. MD. RAKIB HASSAN 23
Supervised Learning
❖ In supervised learning, the training data you feed to
the algorithm includes the desired solutions, called
labels.
Figure: A labeled training set for supervised learning (e.g., spam classification)
PROFESSOR DR. MD. RAKIB HASSAN 24
Supervised Learning Tasks
❖ Classification
❑ Example: Spam filter
o It is trained with many example emails along with their class (spam or ham), and
it must learn how to classify new emails.
❖ Regression
❑ Predicting a target numeric value, such as the price of a car,
given a set of features (mileage, age, brand, etc.) called
predictors.
❑ To train the system, many examples of cars, including both
their predictors and their labels (i.e., their prices) are
provided.
PROFESSOR DR. MD. RAKIB HASSAN 25
Regression
Figure: Regression
❖ Some regression algorithms can be used for classification as
well, and vice versa.
❑ For example, Logistic Regression is commonly used for classification, as it
can output a value that corresponds to the probability of belonging to a
given class (e.g., 20% chance of being spam).
PROFESSOR DR. MD. RAKIB HASSAN 26
Supervised Learning Algorithms
❖ k-Nearest Neighbors
❖ Linear Regression
❖ Logistic Regression
❖ Support Vector Machines (SVMs)
❖ Decision Trees and Random Forests
❖ Neural networks
❑ Some neural network architectures can be unsupervised, such
as autoencoders and restricted Boltzmann machines.
❑ They can also be semisupervised, such as in deep belief
networks and unsupervised pretraining.
PROFESSOR DR. MD. RAKIB HASSAN 27
Unsupervised Learning
❖ In unsupervised learning, the training data is
unlabeled.
❑ The system tries to learn without a teacher.
Figure: An unlabeled training set for unsupervised learning
PROFESSOR DR. MD. RAKIB HASSAN 28
Unsupervised Learning Algorithms
❖ Clustering
❑ k-Means
❑ Hierarchical Cluster Analysis (HCA)
❑ Expectation Maximization
❖ Visualization and dimensionality reduction
❑ Principal Component Analysis (PCA)
❑ Kernel PCA
❑ Locally-Linear Embedding (LLE)
❑ t-distributed Stochastic Neighbor Embedding (t-SNE)
❖ Association rule learning
❑ Apriori
❑ Eclat
PROFESSOR DR. MD. RAKIB HASSAN 29
Unsupervised - Clustering
❖ Tries to detect groups of similar patterns
❑ Example: A clustering algorithm can detect groups of similar
visitors of a website without any help. For example, it might
notice that 40% of the visitors are males who love comic
books and generally read the blog in the evening, while 20%
are young sci-fi lovers who visit during the weekends, and so
on.
❖ A hierarchical clustering algorithm can also subdivide
each group into smaller groups.
PROFESSOR DR. MD. RAKIB HASSAN 30
Unsupervised - Clustering (Cont.)
Figure: Clustering
PROFESSOR DR. MD. RAKIB HASSAN 31
Unsupervised - Visualization Algorithm
❖ A lot of complex and unlabeled data is provided and
it outputs a 2D or 3D representation of the input data
that can easily be plotted.
Figure: Example of a t-SNE visualization highlighting semantic clusters
(animals are well separated from vehicles, horses are close to deer but far from birds, and so on.)
PROFESSOR DR. MD. RAKIB HASSAN 32
Unsupervised - Visualization Algorithm (Cont.)
❖ These algorithms try to preserve as much structure as
they can (e.g., trying to keep separate clusters in the
input space from overlapping in the visualization), so
you can understand how the data is organized and
perhaps identify unsuspected patterns.
PROFESSOR DR. MD. RAKIB HASSAN 33
Unsupervised - Dimensionality Reduction
❖ The goal is to simplify the data without losing too
much information.
❑ One way to do this is to merge several correlated features into
one.
o For example, a car’s mileage may be very correlated with its age, so the
dimensionality reduction algorithm will merge them into one feature that
represents the car’s wear and tear.
o This is called feature extraction.
PROFESSOR DR. MD. RAKIB HASSAN 34
Anomaly Detection
❖ The system is trained with normal instances, and
when it sees a new instance it can tell whether it
looks like a normal one or whether it is likely an
anomaly.
❖ Examples:
❑ Detecting unusual credit card transactions to prevent fraud
❑ Catching manufacturing defects
❑ Automatically removing outliers from a dataset before feeding
it to another learning algorithm.
PROFESSOR DR. MD. RAKIB HASSAN 35
Anomaly Detection (Cont.)
Figure: Anomaly detection
PROFESSOR DR. MD. RAKIB HASSAN 36
Association Rule Learning
❖ The goal is to dig into large amounts of data and
discover interesting relations between attributes.
❖ Example:
❑ Suppose you own a supermarket. Running an association rule
on your sales logs may reveal that people who purchase
barbecue sauce and potato chips also tend to buy steak. Thus,
you may want to place these items close to each other.
PROFESSOR DR. MD. RAKIB HASSAN 37
Semisupervised Learning
❖ Some algorithms can deal with partially labeled training
data, usually a lot of unlabeled data and a little bit of
labeled data.
❑ This is called semisupervised learning
Figure: Semisupervised learning
PROFESSOR DR. MD. RAKIB HASSAN 38
Semisupervised Learning (Cont.)
❖ Example:
❑ Google Photos which automatically recognizes that the same
person A shows up in photos 1, 5, and 11, while another
person B shows up in photos 2, 5, and 7. This is the
unsupervised part of the algorithm (clustering).
❑ Now all the system needs is for you to tell it who these people
are. Just one label per person, and it is able to name everyone
in every photo, which is useful for searching photos.
❑ Sometimes it is necessary to provide a few labels per person
and manually clean up some clusters.
PROFESSOR DR. MD. RAKIB HASSAN 39
Semisupervised Learning (Cont.)
❖ Most semisupervised learning algorithms are
combinations of unsupervised and supervised
algorithms.
❖ Example:
❑ Deep belief networks (DBNs) are based on unsupervised
components called restricted Boltzmann machines (RBMs)
stacked on top of one another.
❑ RBMs are trained sequentially in an unsupervised manner,
and then the whole system is fine-tuned using supervised
learning techniques.
PROFESSOR DR. MD. RAKIB HASSAN 40
Reinforcement Learning
❖ The learning system, called an agent, can observe the
environment, select and perform actions, and get
rewards in return (or penalties in the form of negative
rewards).
❖ It must then learn by itself what is the best strategy,
called a policy, to get the most reward over time.
❖ A policy defines what action the agent should choose
when it is in a given situation.
PROFESSOR DR. MD. RAKIB HASSAN 41
Reinforcement Learning (Cont.)
Figure: Reinforcement learning
PROFESSOR DR. MD. RAKIB HASSAN 42
Reinforcement Learning (Cont.)
❖ Examples:
❑ Many robots implement Reinforcement Learning algorithms
to learn how to walk.
❑ DeepMind’s AlphaGo program is also a good example of
Reinforcement Learning:
o It made the headlines in March 2016 when it beat the world champion Lee Sedol
at the game of Go.
o It learned its winning policy by analyzing millions of games, and then playing
many games against itself.
o Note that learning was turned off during the games against the champion -
AlphaGo was just applying the policy it had learned.
PROFESSOR DR. MD. RAKIB HASSAN 43
Applications of Unsupervised Learning
❖ Organize computing clusters
❖ Social network analysis
❖ Market segmentation
❖ Market research
❖ Astronomical data analysis
❖ Gene sequence analysis
❖ Object recognition
PROFESSOR DR. MD. RAKIB HASSAN 44
Problem
❖ Of the following examples, which would you address
using an unsupervised learning algorithm? (Check all
that apply.)
❑ Given email labeled as spam/not spam, learn a spam filter.
❑ Given a set of news articles found on the web, group them
into set of articles about the same story.
❑ Given a database of customer data, automatically discover
market segments and group customers into different market
segments.
❑ Given a dataset of patients diagnosed as either having
diabetes or not, learn to classify new patients as having
diabetes or not.
PROFESSOR DR. MD. RAKIB HASSAN 45
Problem (Cont.)
❖ Of the following examples, which would you address
using an unsupervised learning algorithm? (Check all
that apply.)
❑ Given email labeled as spam/not spam, learn a spam filter.
❑ Given a set of news articles found on the web, group them
into set of articles about the same story.
❑ Given a database of customer data, automatically discover
market segments and group customers into different market
segments.
❑ Given a dataset of patients diagnosed as either having
diabetes or not, learn to classify new patients as having
diabetes or not.
PROFESSOR DR. MD. RAKIB HASSAN 46
Definitions
❖ Training set
❑ Examples that the system uses to learn.
❖ Training instance (or sample)
❑ Each training example is called a training instance (or sample).
❖ Data mining
❑ Applying ML techniques to dig into large amounts of data can
help discover patterns that were not immediately apparent.
PROFESSOR DR. MD. RAKIB HASSAN 47
Definitions (Cont.)
❖ Attribute and Feature
❑ In Machine Learning, an attribute is a data type (e.g.,
“Mileage”), while a feature has several meanings depending
on the context, but generally means an attribute plus its value
(e.g., “Mileage = 15,000”).
❑ Many people use the words attribute and feature
interchangeably, though.
PROFESSOR DR. MD. RAKIB HASSAN 48
PROF. DR. MD. RAKIB HASSAN 49