0% found this document useful (0 votes)
38 views32 pages

Advanced Machine Learning 1 - Without Solutions

The document outlines a course on Advanced Machine Learning focusing on Recurrent Neural Networks (RNNs) and their applications. It covers topics such as the structure and function of RNNs, generative methods, and techniques for processing sequential data, including padding and masking. Additionally, it includes learning objectives, tasks for practical application, and an overview of different types of RNNs and their use cases.

Uploaded by

samuel tekyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views32 pages

Advanced Machine Learning 1 - Without Solutions

The document outlines a course on Advanced Machine Learning focusing on Recurrent Neural Networks (RNNs) and their applications. It covers topics such as the structure and function of RNNs, generative methods, and techniques for processing sequential data, including padding and masking. Additionally, it includes learning objectives, tasks for practical application, and an overview of different types of RNNs and their use cases.

Uploaded by

samuel tekyi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Advanced Machine Learning

Introduction to Recurrent Neural Networks


Prof. Dr. Christian Schwede
Content - Overview

Class 1: (Introduction to Recurrent Networks)


Recurrent Neural Networks
Padding and Masking
Embeddings
Recurrent Neural Networks
Long short-term memory
Gated recurrent unit
Differentiable neural computer
Generative Methods
Generative Adversarial Networks
Variational Autoencoder
Diffusion Models
Generative pre-trained transformer
Graph Machine Learning
Introduction into GML
Graph Neural Networks

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 2
Final Exam

Oral exam of 30 minutes

To be accepted for the final exam you have to deliver all but one python exercises in time

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 3
Learning objectives

How can sequential data with variable length be processed


with Artificial Neural Networks?

What are Recurrent Neural Networks and how do they work?

How can text be transformed to be used as feature vectors?

How can batch processing be applied with various input


length?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 4
Introduction to Recurrent Neural
Networks
Feed-Forward Artificial Neural Networks

Feed-Forward Networks (FFN) are build


to learn a function 𝐲 = 𝑓 𝒙 from data pairs
𝒙, 𝒚 with 𝒙, 𝒚 ∈ ℝ𝑛 , ℝ𝑚
Inputs are fed forward through the network
from input to output, using weights and
activation functions (e.g. sigmoid, ReLu,
tanh, swish) to modify the output of every
neuron
FFN are trained with gradient descent using
back-propagation based on mini-
batches (stochastic gradient descent)
With Convolutional Neural Networks
(CNN) inputs with a matrix format such as
images can be used

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 6
Task: Prediction of next letter

Build a FFN using tensor flow that


predicts the next letter in a
sentence. Use “The Research
Master Data Science at HSBI
rocks!” as data and forget about
test and train split.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 7
Task: Prediction of last letter of a word

How can we build a FNN to


predict the last letter of a word?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 8
Two Problems of feed forward neural networks

The size of input samples has to be fixed


The input data is not linked together
 But for “predicting the last letter of a word”: previous letters are required and hence there is a need to
remember them

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 9
Recurrent Neural Networks (RNN)

Recurrent Neural Networks:


Work good with sequential data like time-series
data and text data or video and audio streams
the output from the previous step is fed as input to
the current step: 𝒉𝒕 , called hidden state (Memory
State)
The fundamental processing unit in a RNN is a
Recurrent Unit or Cell
In each “time” step an additional input and an
additional output can be generated
Variable input and output length are possible

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 10
A closer look…
H O

FFN 𝑾𝑑ℎ 𝑾ℎ𝑞


Consider a simple FFN with one hidden layer 𝑿 𝑯 𝑶
We propagate a batch 𝑿 ∈ ℝ𝑛×𝑑 with batch size 𝑛 und feature
dimension 𝑑 as input
The output of a hidden layer is than 𝑯 = 𝜙 𝑿𝑾𝑑ℎ + 𝒃ℎ , with 𝜙 as
activation function
The last layer would calculate 𝑶 = 𝑯𝑾ℎ𝑞 + 𝒃𝑞 as q-dimensional
output 𝑶 ∈ ℝ𝑛×𝑞
RNN H O
Now we propagate 𝑿𝒕 ∈ ℝ𝑛×𝑑 at every time step 𝑡
The output of the hidden layer at time step 𝑡 depends on the output 𝑾𝑑ℎ 𝑾ℎ𝑞
(hidden state) at time step 𝑡 − 1: 𝑿𝒕 𝑯𝒕 𝑶𝒕
𝑯𝒕 = 𝜙 𝑿𝑡 𝑾𝑑ℎ + 𝑯𝑡−1 𝑾ℎℎ + 𝒃ℎ
The output layer stays the same
Note: The amount of weights does not grow with time, since 𝑾ℎℎ
weights are shared through all time steps

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede


𝑯𝒕−𝟏 Seite 11
Deeper Recurrent Neural Networks

(0)
Input 𝑿𝒕 ∈ ℝ𝑛×𝑑 = 𝑯𝑡
(𝒍)
The output of the 𝑙 𝑡ℎ hidden layer 𝑯𝒕 ∈ ℝ𝒏×𝒉𝒍 with 𝑙 = 1, … , 𝐿 is:
(𝒍) (𝑙−1) (𝑙) (𝑙) (𝑙) (𝑙)
𝑯𝒕 = 𝜙𝑙 𝑯𝑡 𝑾ℎ𝑙−1 ℎ𝑙 + 𝑯𝑡−1 𝑾ℎ𝑙 ℎ𝑙 + 𝒃ℎ𝑙
(𝑙) (𝑙)
with 𝑾ℎ𝑙 ℎ𝑙 ∈ ℝ𝒉𝒍 ×𝒉𝒍 and 𝑾ℎ𝑙−1 ℎ𝑙 ∈ ℝ𝒉𝒍−𝟏 ×𝒉𝒍
The output layer is:
(𝐿)
𝑶𝑡 = 𝑯 𝑡 𝑾 ℎ𝐿 𝑞 + 𝒃 𝑞

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede https://classic.d2l.ai/chapter_recurrent-modern/deep-rnn.html#fig-deep-rnn Seite 12
Types of RNNs

One-to-many: single input and multiple outputs; example image legend


Many-to one: several inputs and a single output; example sentiment analysis of text: identify a feeling
from a group of words
Many-to-many: several inputs and get several outputs; not necessarily with the same input and output
dimension; example translation of text

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 13
Task: Applications of RNN

Of what applications can you


think of and what type of RNN
would be needed?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 14
Task: Prediction of last letter of a word

How can we build a RNN to predict the


last letter of a word using the
long_text.txt data file?
Use tensorflow.keras.layersSimpleRNN.
How can you train the model when word
have different sizes?

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 15
Padding to deal with different input sizes
Example:
[[ 711 6 71 0 0 0]
[ 73 8 2 55 7 0]
A better way to deal with different input sizes is padding
[ 83 91 45 64 3 7]]
The maximum input size for training must be set before
All inputs small that maximum input size are padded with a special sign (e.g. zero)
Padding can be done before or after the real input

keras.src.utils.pad_sequences(sequences=data:list, maxlen=seq_length_max, padding='pre'‚ truncating='pre' , value=0)

This function transforms a list (of length num_samples) of sequences (lists of integers) into a
2D NumPy array of shape (num_samples, num_timesteps).
num_timesteps is either the maxlen argument if provided, or the length of the longest sequence
in the list
Sequences that are shorter than num_timesteps are padded with value until they are
num_timesteps long.
Sequences longer than num_timesteps are truncated so that they fit the desired length.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 16
Masking to help the RNN deal with padded inputs

Masking is a way to tell sequence-processing layers that


certain timesteps in an input are missing, and thus should be
skipped when processing the data
The masking layer act as a Boolean filter that will not let
masked inputs (e.g. inputs padded with zeros) pass
The effect is, that the output of the rnn as well as the hidden
state are passed unchanged to the next sequence step

There are three ways to introduce input masks in Keras models:


Add a keras.layers.Masking layer
(model.add(Masking(mask_value=0))
Pass a mask argument manually when calling layers that
support this argument (e.g. RNN layers)
Configure a keras.layers.Embedding layer with
mask_zero=True.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede https://towardsdatascience.com/how-does-masking-work-in-an-rnn-and-variants-and-why-537bf63c306d Seite 17
Task: Prediction of last letter of a word using padding and masking

Adjust the previous model adding


padding and a masking layer after the
input layer to the RNN.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 18
How to pass more complex text as an input to an RNN?

We have already used One-Hot Encoding


One-hot encoding results in high-dimensional vectors, making it computationally expensive and
memory-intensive, especially with large vocabularies
It does not capture semantic relationships between words; each word is treated as an isolated
entity without considering its meaning or context
It is restricted to the vocabulary seen during training, making it unsuitable for handling out-of-
vocabulary words

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 19
Bag of Word (Bow)

Bag of Word (Bow) is a text representation technique


that represents a document as an unordered set of
words and their respective frequencies
It discards the word order and captures the frequency of
each word in the document, creating a vector
representation
BoW ignores the order of words in the document,
leading to a loss of sequential information and
context
less effective for tasks where word order is crucial, such
as in natural language understanding
BoW representations are often sparse
increased memory requirements and computational
inefficiency (especially when dealing with large
datasets)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 20
Example: Bag of Words

from sklearn.feature_extraction.text import CountVectorizer


Vocabulary: {'the': 13, 'quick': 10, 'brown': 1, 'fox': 4,
'jumps': 7, 'over': 9, 'lazy': 8, 'dog': 3, 'never': 5, 'jump': 6,
# Sample sentences (documents)
corpus = [ 'quickly': 11, 'foxes': 2, 'are': 0, 'and': 12}
'The quick brown fox jumps over the lazy dog',
'Never jump over the lazy dog quickly',
'Brown foxes are quick and lazy',
]
Bag of Words Matrix:
# Initialize the CountVectorizer (BoW model)
[[1 1 0 1 1 0 0 1 1 1 1 0 0 2]
vectorizer = CountVectorizer() [1 0 0 1 0 1 1 0 1 1 0 1 0 2]
# Fit and transform the corpus into a bag-of-words model [0 1 1 0 0 0 0 0 1 0 1 0 1 0]]
X = vectorizer.fit_transform(corpus)

# Show the vocabulary (index mapping)


print("Vocabulary:", vectorizer.vocabulary_)
Feature names: ['and' 'are' 'brown' 'dog' 'fox' 'foxes' 'jump'
# Show the Bag of Words representation
'jumps' 'lazy' 'over' 'quick' 'quickly' 'the' 'never']
print("Bag of Words Matrix:\n", X.toarray())

# Show the feature names (words in the vocabulary)


print("Feature names:", vectorizer.get_feature_names_out())

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 21
Word Embeddings

Word Embedding is an approach for representing words and documents


Word Embedding or Word Vector is a numeric vector input that represents a word
in a lower-dimensional space
It allows words with similar meanings to have a similar representation

Need for Word Embedding?


To reduce dimensionality
To use a word to predict the words with similar meaning
Inter-word semantics must be captured

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 22
import numpy as np
Example: Word embeddings import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from gensim.models import KeyedVectors

# Load pre-trained GloVe or Word2Vec embeddings


word_vectors = KeyedVectors.load_word2vec_format('data/GoogleNews-vectors-negative300.bin', binary=True)

# Sample words to visualize (choose words related to animals, fruits, and countries for clarity)
words = ['dog', 'cat', 'apple', 'banana', 'france', 'germany', 'lion', 'tiger', 'paris', 'berlin']

# Retrieve the word embeddings for the selected words


word_embeddings = np.array([word_vectors[word] for word in words])
n_samples = len(word_embeddings)

# Set perplexity to a value less than n_samples (e.g., 5 or 10)


perplexity_value = min(30, n_samples // 3) # Adjust based on your dataset size

# Initialize t-SNE with adjusted perplexity


tsne = TSNE(n_components=2, perplexity=perplexity_value, random_state=42)

# Fit and transform the embeddings


word_embeddings_2d = tsne.fit_transform(word_embeddings)

# Plot the words in the 2D space


plt.figure(figsize=(8, 6))
plt.scatter(word_embeddings_2d[:, 0], word_embeddings_2d[:, 1], color='blue')

# Annotate the points with the corresponding words


for i, word in enumerate(words):
plt.annotate(word, xy=(word_embeddings_2d[i, 0], word_embeddings_2d[i, 1]), fontsize=12)

plt.title("2D Visualization of Word Embeddings")


plt.grid(True)
plt.show()

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 23
Word2Vec

Word2Vec is a approach based on artificial neural networks for generating word embeddings
Developed by a team at Google
Word2Vec aims to capture the semantic relationships between words by mapping them to high-dimensional
vectors
There are two neural embedding methods for Word2Vec:
Continuous Bag of Words (CBOW) and Skip-gram

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede https://towardsdatascience.com/word2vec-research-paper-explained-205cb7eecc30 Seite 24
Continuous Bag of Words (CBOW)

CBOW is a feedforward neural network with a


single hidden layer
The input layer represents the context
words
The output layer represents the target word at
the center of the window of context words
The hidden layer contains the learned
continuous vector representations (word
embeddings) of the input words
The dimensionality of the hidden layer
represents the size of the word embeddings

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 25
Skip-Gram Model

Skip-Gram Model also learns distributed representations of words in a continuous vector space
main objective of Skip-Gram is to predict context words (words surrounding a target word) given a target
word
This is the opposite of the Continuous Bag of Words (CBOW) model

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 26
Using the Embedding Layer

Embedding Layer can be integrated into the RNN after the input layer
The layer is trained with backpropagation together with the rest of the model and learns a word
embedding (representation) that fits to the task at hand
Pretrained-weights can be used to initialize the layer
Masking can be activated
keras.layers.Embedding(
input_dim,
output_dim,
mask_zero=False,
weights=None,
)

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 27
Task: Prediction of last letter of a word using embeddings

Adjust the previous model replacing the


mask layer by an embedding. Get rid of
the one-hot-encoding of the input.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 28
Conclusion

RNNs are used for sequence data of variable input


They use a hidden state that is passed from time step to
time step
There are different architectures and use cases according
to the input and output dimensions
Padding and masking can be used to train with batches of
data with variable length
Embeddings are a good way to transform words to
numbers with respect to preservation of meanings

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 29
Homework
Create a RNN to predict the
length of a word. Use the
long_text.txt data.

Advanced Machine Learning Hochschule Bielefeld I Prof. Dr. Christian Schwede Seite 30
Questions?
Prof. Dr.-Ing. Christian Schwede
Fachbereich Ingenieurwissenschaften und Mathematik
Campus Gütersloh
Studiengangsleiter Forschungsmaster Data Science
Mitglied des Vorstands Institute for Data Science Solutions (IDAS)

[email protected]

You might also like