AML Assignment 2
Team Members:
Nalla Janardhana Rao – MDS202426
Pranav Pothan – MDS202429
Raja S – MDS202430
Task 1
Aim: Build a classifier using an RNN and an LSTM model to classify SMS as Spam or HAM.
Data Processing & Visualization: URLs, symbols and numbers are removed. SMS are
lemmatised, tokenised and padded to the same length.
Architecture: Embedding layer → RNN/LSTM → Dropout → Dense → Dropout → Dense
(1 output).
Training: Adam optimizer (learning rate = 0.0001), Binary Cross-Entropy loss, and Early
Stopping (patience = 5) to prevent overfitting.
Accuracy: RNN model achieves 97.74% validation accuracy and the LSTM model achieves
98.03%.
Task 2
Aim: Build an SMS generator that takes the first half of the SMS as input and predicts the
later half.
Data Processing & Visualization: All SMS are tokenised. SMS longer than 41 are rare, so
max length = 50. All SMS are padded; the first half is given as input and the second half as
target with an <END> token.
Architecture: Embedding layer, followed by 3 RNN layers or 1 LSTM layer (with same number
of parameters), and a final Linear layer with vocab-size output.
Training: Adam optimizer (learning rate = 0.001), Binary Cross-Entropy loss, trained for 100
epochs.
Performance: Both RNN and LSTM models successfully generate most of the second part of
SMS after sufficient training.
Task 3
Aim: Create a Variational Autoencoder (VAE) for notMNIST to demonstrate reconstruction,
generation, and latent space interpolation, and a Conditional Generative Adversarial Network
(CGAN) to generate 4 distinct, class-specific images for each of the 10 notMNIST classes (A–J).
Data Processing & Normalization: The anubhavmaity/notMNIST dataset is loaded as
28 × 28 grayscale images. Two data loaders are used: one scaled to [0, 1] for the VAE (BCE
loss), and another scaled to [−1, 1] for the CGAN (Generator uses Tanh output).
VAE Architecture: The VAE uses a convolutional encoder mapping the 28 × 28 input to a
1
20-dimensional latent distribution (µ, log σ 2 ). Using the reparameterization trick, a vector z is
sampled and decoded back into an image through transposed convolutions.
VAE Training and Loss: Trained for 10 epochs using Adam. The total loss = Reconstruction
Loss (Binary Cross-Entropy) + KL Divergence (to enforce latent space ∼ N (0, 1)).
VAE Results: After training, the VAE demonstrates successful image reconstruction (real vs
decoded), random generation from N (0, 1), and smooth latent interpolations.
CGAN Generator: Takes a random noise vector concatenated with a one-hot class label,
passes through a dense projection and three ConvTranspose2d layers with BatchNorm + ReLU,
ending with a Tanh activation output in [−1, 1].
CGAN Discriminator: A convolutional network taking the image concatenated with a label
embedding, using Conv2d layers, LeakyReLU, BatchNorm, and a final Linear + Sigmoid output
for real/fake classification.
CGAN Training Setup: Trained for 15 epochs using Adam (β = (0.5, 0.999)). Both Gener-
ator and Discriminator use Binary Cross-Entropy Loss (BCELoss).
CGAN Training Procedure: The Discriminator is trained on real and fake (Image, Label)
pairs, followed by training the Generator to fool the Discriminator.
Conditional Generation: After training, the Generator produces 4 examples per class (A–J),
showing successful label-controlled notMNIST image generation.