CSE 523 : Machine Learning (ML)
Lecture : 2 and 3
Introduction to Machine Learning Course
Dhaval Patel, Ph.D
Assistant Professor,
School of Engineering and Applied Science,
Ahmedabad University, Gujarat, India
January 9-10, 2020
Outline
• About CSE 523 – A Course on Machine Learning
- - Project Guidelines
• A Brief on Project Areas
- Environment and Climate Change
- Intelligent Transportation System (ITS)
- Natural Language Processing (NLP)
- 5G/6G Wireless Networks
- Biology/Bioinformatics/Healthcare
2
About CSE 523 – A Course on Machine Learning
- Project Guidelines
What is the project all about?
As a part of the project component comprising of 35% weightage, the students
would be required to do a project. There are five project areas that have been
identified. They are listed as below:
1. Environment and Climate change
2. Intelligent Transportation System (ITS)
3. Natural Language Processing (NLP)
4. 5G/6G Wireless Networks
5. Health care / Biology /Bioinformatics
3
Ml Project Areas
- Environment and Climate change
Ml Project Areas
- Environment and Climate change
Ml Project Areas
- Environment and Climate change
Ml Project Areas
- Environment and Climate change
Abstract
Climate change is one of the greatest challenges facing humanity, and
we, as machine learning experts, may wonder how we can help. Here
we describe how machine learning can be a powerful tool in reducing
greenhouse gas emissions and helping society adapt to a changing
climate. From smart grids to disaster management, we identify high
impact problems where existing gaps can be filled by machine learning,
in collaboration with other fields. Our recommendations encompass
exciting research questions as well as promising business
opportunities. We call on the machine learning community to join
the global effort against climate change.
Ml Project Areas
- Environment and Climate change
Ml Project Areas
- Environment and Climate change
[Link]
Ml Project Areas
- Environment and Climate change
Task:
Using Machine Learning To Predict Local Weather
Data Set:
Charlotte, NC Climate Data from 2013 to 2018 (downloaded from the
NOAA NCEI site - [Link]
Ml Project Areas
- Environment and Climate change
Data Set:
Charlotte, NC Climate Data from 2013 to 2018 (downloaded from the
NOAA NCEI site - [Link]
Ml Project Areas
- Environment and Climate change
How Machine Learning and AI Can Help in the Fight Against Climate Change?
Source: [Link]
Ml Project Areas
- Environment and Climate change
Adaptations:
Climate prediction
Data
Predictive
Description and Data Analysis
Modeling
preparation
Charlotte, NC Climate Data from 2013 to 2018 (downloaded from the NOAA NCEI site - [Link]
ML Project Areas
- Intelligent Transportation Systems
14
ML Project Areas
- Intelligent Transportation Systems
15
ML Project Areas
- Intelligent Transportation Systems
16
ML Project Areas
- Intelligent Transportation Systems
17
ML Project Areas
- Intelligent Transportation Systems
18
ML Project Areas
- Intelligent Transportation Systems
V2I Architecture
Each sensor generates data
Many Sensors/car
➢ V2V and V2I communications : Sensor data exchange
➢ Applications: Safety, transportation operations, cargo, and infotainment
Source: P. Kumari, N. Gonzalez-Prelcic and R. W. Heath, "Investigating the IEEE 802.11ad Standard for Millimeter Wave
19
Automotive Radar," in IEEE VTC Fall, 2015
ML Project Areas
- Intelligent Transportation Systems Learning based Channel Estimation
Traffic prediction
Source: [Link]
[Link]
Vehicle Trajectory Prediction
Source: H. Ye, G. Y. Li and B. F. Juang, "Deep Reinforcement Learning
Source: N. Deo and M. M. Trivedi, "Multi-Modal Trajectory Prediction of Surrounding Vehicles Based Resource Allocation for V2V Communications," in IEEE
with Maneuver based LSTMs," 2018 IEEE Intelligent Vehicles Symposium (IV), Changshu, 2018, Transactions on Vehicular Technology, vol. 68, no. 4, pp. 3163-3173,
pp. 1179-1184. April 2019.
ML Project Areas
- Intelligent Transportation Systems
ML Project Areas
- Intelligent Transportation Systems
22
ML Project Areas
- Intelligent Transportation Systems
Case Study:Machine Learning for Beam Selection in V2I
Methodology for Data Generation
[Link] in SUMO
Template. Route file in SUMO
2. 1.
[Link] file in SUMO
[Link] file
6. in SUMO 3.
5.
SUMO
GEM 𝑽𝟐
7. 9.
4.
8. 10.
Source: A. Klautau, P. Batista, N. González-Prelcic, Y. Wang and R. W. Heath, "5G MIMO Data for Machine Learning:
Application to Beam-Selection Using Deep Learning," 2018 Information Theory and Applications Workshop (ITA), San23Diego,
ML Project Areas
- Intelligent Transportation Systems
The goal is to choose best pair of beams for analog beam forming with both
transmitter and receiver having antenna arrays with only one radio frequency chain.
Machine Learning Input features
Ray Tracing Study Area V2I Study Area Grid Resolution
337 x 202𝑚2 23 X 250 𝑚2 1 x 1 𝑚2
𝑄𝑠
Grid Resolution Matrix
1 x 1 𝑚2 Negative element : Location is occupied
Positive element : Location of receiver
Zero : Position is not occupied
24
ML Project Areas
- Intelligent Transportation Systems
Steps to Perform Classification
Step 1: Generate and Validate Data Set
Step 2: Divide into Training and Testing Sets
Step 3: Import Classifier in Python Scripts
Step 4: Provide input features file
Step 5: Train feature and test feature have values -1 for blockers (Truck) and 1
for non blockers .
Step 6: convert output to single number (the class label) and eliminate pairs
that do not appear
Step 7: Iterate over Classifiers
Accuracy (%)
Classifier All Data Only NLOS
Linear SVM 31 11
Decision tree 54 28
Deep neural network 65 37
25
Project Areas
- Natural Language Processing
26
Source: [Link]
Project Areas
- Natural Language Processing
NLP is a branch of artificial intelligence which is focused on the enabling the computers to
understand and interpret the human language.
27
Source: [Link]
ML Project Areas
- Natural Language Processing
Source-[Link] 28
Project Areas
- Natural Language Processing
29
Project Areas
- Natural Language Processing
NLP is a branch of artificial intelligence which is focused on the enabling the computers to
understand and interpret the human language.
30
Source: [Link]
Project Areas
- Natural Language Processing
Word2Vec representations of words projected onto
a two-dimensional space.
31
Project Areas
- Natural Language Processing
NLP is a branch of artificial intelligence which is focused on the enabling the computers to
understand and interpret the human language.
32
Source: [Link]
Project Areas
- Natural Language Processing
NLP is a branch of artificial intelligence which is focused on the enabling the computers to
understand and interpret the human language.
33
Source: Google Duplex: A.I. Assistant Calls Local Businesses To Make Appointments
Project Areas
- Natural Language Processing
Word Count in a Speech
34
Source: [Link]
Project Areas
- Natural Language Processing
Case study: Text Classification
Dataset : Amazon Review Data set which has 10,000 rows of Text data which is classified
into “Label 1” and “Label 2”. The Data set has two columns “Text” and “Label”.
Text, Label (Review-1) : Good
Stunning even for the non-gamer: This sound track was beautiful! It paints the scenery in
your mind so well I would recommend it even to people who hate video game music! I have
played the game Chrono Cross but out of all of the games I have ever played it has the best
music! It backs away from crude keyboarding and takes a fresher step with grate guitars and
soulful orchestras. It would impress anyone who cares to listen! ^_^,__label__2
Text, Label (Review-2): Bad
" The Worst!: A complete waste of time. Typographical errors, poor grammar, and a totally
pathetic plot add up to absolutely nothing. I'm embarrassed for this author and very
35
disappointed I actually paid for this book.",__label__1
Project Areas
- Natural Language Processing
Step 1: Add the libraries
Step 2: Set the Random seed
Step 3: Read the dataset
36
Project Areas
- Natural Language Processing
Step 4: Data Pre-processing
1. Remove Blank rows in Data, if any
2. Change all the text to lower case
3. Word Tokenization
4. Remove Stop words
5. Remove Non-alpha text
6. Word Lemmatization
37
Project Areas
- Natural Language Processing
Step 5: Prepare Train and Test dataset
Step 6: Encoding
38
Project Areas
- Natural Language Processing
Step 7: Word Vectorization
It is a process of turning a collection of text documents into numerical feature vectors
One of the method is term frequency-inverse document frequency (TF-IDF)
Term Frequency: This summarizes how often a given word appears within a document.
Inverse Document Frequency: This down scales words that appear a lot across documents.
Vectorized words
39
Project Areas
- Natural Language Processing
Step 8: Use ML algorithm to predict the outcome on test dataset
40
Project Areas
- Natural Language Processing
Step 8: Use ML algorithm to predict the outcome on test dataset
41
Project Areas
- 5G/6G Wireless Network
Why machine Learning for wireless networks? (1/2)
Increasing antennas in massive MIMO has
changed channel properties
Mathematical complexity for scenarios
like underwater communication
Molecular communication
Derya Malak, Ozgur B. Akan, “Molecular communication NANO networks inside human body,” Elsevier Nano
Communication Networks, Volume 3, Issue 1, 2012, Pages 19-35. 42
Project Areas Case study: ANN based Spectrum sensing for Cognitive
- 5G/6G Wireless Network Radio Network
Fixed spectrum access v/s Dynamic spectrum access
Spectrum bands are Spectrum bands are assigned
allocated/assigned statically dynamically
Unlicensed Band v/s Licensed Band
Over-crowded Under-utilized
Source:M. López-Benítez et al., “Spectral occupation measurements and blind standard recognition sensor for cognitive
radio networks,”Proc. 4th Int’l. Conf. Cognitive Radio Oriented Wireless Networks and Comms. (CrownCom 2009), Hannover,
Germany, June 22-24, 2009.
Background
Fixed spectrum access v/s Dynamic spectrum access
Spectrum bands are Spectrum bands are assigned
allocated/assigned statically dynamically
Unlicensed Band v/s Licensed Band
Over-crowded Under-utilized
Analogy:
Road → UnLicensed band
BRTS route → Licensed band
Vehicles → Channel users
Source: Google Images (Shivranjini cross roads, Ahmedabad)
Project Areas
- 5G/6G Wireless Network
Solution Opportunistic Spectrum Access (OSA)
Opportunity : Vacancy
of Primary User
i.e, Hunt for the white space for
the needy(Secondary User)
through a technique called
Spectrum Sensing
Spectrum Sensing
-Parametric
-Non Parametric
Project Areas
- 5G/6G Wireless Network
Why Machine Learning for CRN?
The main task of any machine learning algorithm is:
Or
PU present
PU absent
From CRN perspective, task is to identify whether PU is present or absent
Thus, machine learning can be used to address this binary classification problem 46
Project Areas
- 5G/6G Wireless Network
ANN Hybrid Sensing Scheme
❑ The scheme is a combination of Classical Energy Detection, Likelihood Ratio
Test statistic (LRS-G2) and Artificial Neural Network (ANN).
❑ Features:
1. Energy : This is the simplest and most efficient non-parametric sensing
feature.
2. LRS-𝑮𝟐 Zhang Statistic : A non-parametric sensing feature, with highest
statistical power for the normality test in comparison with other Goodness of
fit based tests.
N is the Sample size and 𝑭𝟎 𝒚 is the known cumulative distribution
function (CDF) of noise
❑ Four Features (Input to ANN): (1) Sample’s Energy (2) Sample’s Zhang
Statistic (3) Previous Sample’s Energy (4) Previous Sample’s Zhang Statistic
47
Project Areas
- 5G/6G Wireless Network
E
AWGN E_P
Desired
Chunks
N Feature Extraction Z Pd
SNR Signal
Z_P
Testing
Data set
E
Feature E_P
Extraction
Chunks
Chunks
AWGN NN Z Pf
Z_P
48
Project Areas
- 5G/6G Wireless Network
Numerical Result: Pd vs SNR
Radio technology: DCS-1800 DL, False alarm : 0.035, N =100
49
Project Areas
- Environment and Health Care
Application of Machine Learning : Health Care
50
Project Areas
- Biology/Bioinformatics Machine learning in bioinformatics
[Link]
Project Areas
- Biology/Bioinformatics
Project Areas
- Biology/Bioinformatics
Stroke diagnosis
Microarrays
Gene prediction
Project Areas
- Biology/Bioinformatics
Molecular Classification of Cancer by Gene Expression Monitoring using Support
Vector Machine(SVM)
The goal is to classify cancer patients with acute myeloid leukemia
(AML) and acute lymphoblastic leukemia (ALL) using the SVM algorithm.
Project Areas
- Biology/Bioinformatics
Molecular Classification of Cancer by Gene Expression Monitoring using Support
Vector Machine(SVM)
The goal is to classify cancer patients with acute myeloid leukemia
(AML) and acute lymphoblastic leukemia (ALL) using the SVM algorithm.
Source: [Link]
learning-47c62c482aaf
Project Areas
- Biology/Bioinformatics Train Data
Test Data
Project Areas
- Biology/Bioinformatics
About the dataset:
1. Each row represents a different gene.
2. Columns 1 and 2 are descriptions about that gene.
3. Each numbered column is a patient in label data.
4. Each patient has 7129 gene expression values — i.e each patient has one
value for each gene.
5. The training data contain gene expression values for patients 1 through 38.
6. The test data contain gene expression values for patients 39 through 72
Processing Steps:
1. Read Datasets
2. Obtain Normalized Data
3. Dimensionality reduction
4. Hyper parameter optimization
5. SVM Classification model
6. Confusion matrix and visualize with heat map
Project Areas
- Biology/Bioinformatics
Class Label 0 - acute myeloid leukemia
Class Label 1- acute lymphoblastic leukemia
Thank you !!
[Link]
59