0% found this document useful (0 votes)

13 views8 pages

LLM Assignment 2

This document outlines the requirements for Assignment 2 of a course on Large Language Models, due on September 15, 2024. It consists of four tasks involving optimization of non-convex functions, implementing linear regression with a multi-layer neural network, multi-class classification using fully connected neural networks, and training on the MNIST dataset, with specific instructions for data handling, model architecture, and evaluation metrics. Students are expected to implement solutions from scratch using Python, with additional bonus questions for further exploration.

Uploaded by

virusansh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views8 pages

LLM Assignment 2

Uploaded by

virusansh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Course: Large Language Model

Assignment 2
Deadline: 10:00 pm, 15 Sep 2024

Task 1: Optimizer Performance on Non-Convex Functions

1. Non-Convex Function Optimization
a. Optimize the following functions:
i. f (x, y) = (1 - x) ^2 + 100(y - x^2)^2
ii. f(x) = sin(1/x)
Handle (x = 0 ) by setting ( f(0) = 0 ) or slightly exceeding zero.
b. Apply the following optimizers at Learning rate ((alpha)): 0.01, 0.05, 0.1:
● - Gradient Descent
● - Stochastic Gradient Descent with momentum
● - Adam
● - RMSprop
● - Adagrad

Presentation of results:
a) Convergence behaviour for each optimizer and time taken by
each optimizer.
b) Impact of hyperparameters on convergence speed and the final result,
with plots showing convergence.

Note: We need the optimal x you got from the optimizer and the corresponding f(x)
for a fixed learning rate considering all the GD algorithms and whatever threshold
criteria you have chosen to terminate your algorithm.
The question expects to implement the solution from scratch using Python
programming language.
Task 2: Implementing Linear Regression Using a Multi-Layer
Neural Network from Scratch
Dataset Description: Use the Boston Housing Dataset[ Link ] Focus on
predicting the median value [ ‘MEDV’ ] of homes based on two features
● Number of Rooms (`RM`)
● Crime Rate (`CRIM`)

Data Preprocessing:
● Normalize the features to ensure they are on a similar scale.
● Split the dataset into a training set (80%) and a test set (20%).

Model:
In this assignment, you will build a neural network from scratch to perform linear
regression using Python. You will be implementing a multi-layer neural network
with multiple hidden layers.

Details of the Architecture:

Input Layer:
● Number of Neurons: Equal to the number of features in the dataset (e.g.,
2 input neurons if using 2 features from the Boston Housing Dataset).
Number of Hidden Layers: 2 hidden layers.
First Hidden Layer:
● Number of Neurons: 5 neurons.
● Activation Function: ReLU (Rectified Linear Unit) for non-linearity.
Second Hidden Layer:
● Number of Neurons: 3 neurons.
● Activation Function: ReLU (Rectified Linear Unit).
Output Layer:
● Number of Neurons: 1 neuron, since the task involves predicting a single
continuous value (e.g., house price).
.
Training the Network:
1) Implement Gradient Descent for weight updates.
2) Experiment with different learning rates (e.g., 0.01, 0.001) to observe the
effects on model training.
3) Train the network for a set number of epochs (e.g., 1000 epochs) and track
the loss throughout the training process.
4) Implement and compare different optimizers:
a) Basic Gradient Descent
b) Momentum
c) Adam
5) Observe and compare how these optimizers affect the model’s
convergence and final performance.
Evaluation:
1. Evaluate the trained model on a test set.
2. Visualize the regression results by plotting the predicted vs. actual values.
3. Calculate evaluation metrics such as Mean Squared Error (MSE) on the
test data to quantitatively assess the model’s performance.

Bonus Questions:
1. Additional Hidden Layers:
- Add a third hidden layer with 2 neurons and observe the impact on model
performance.
- Compare the performance of networks with different numbers of hidden layers.

2. Regularization:
- Implement L2 regularization (weight decay) and observe how it affects the
model, especially with respect to overfitting.

Expectation of the question is to implement a solution from scratch using

Python.
Task 3: Multi-class classification using Fully Connected
Neural Network
Dataset Description
1) Linearly separable classes dataset: 3 class, 2-dimensional linearly
separable data is given. Each class has 500 data points.
2) Nonlinearly separable classes dataset: 2-dimensional data of 2 or 3
classes that are nonlinearly separable. The number of examples in each
class and their order is given at the beginning of each file.
Divide the data from each class into training, validation, and test data. From each class,
train, validation, and test split should be 60%, 20% and 20%, respectively.

Model: Fully connected neural network (FCNN) for each of the datasets. Try
FCNN with one hidden layer for Dataset 1 (Linearly separable classes dataset)
and 2 hidden layers for Dataset 2 (Nonlinearly separable classes dataset). Try
different numbers of hidden nodes for both datasets and observe the results.
Implement stochastic gradient descent (SGD) for the backpropagation algorithm.
Use squared error as an instantaneous loss function.

Presentation of results:
1) Plot of average error (y-axis) vs epochs (x-axis): Give the plot only for the best
architecture selected after cross-validation.

2) Decision region plot superimposed by training data for each of the datasets:
Give the decision region plot for the best architecture selected after cross-
validation.

3) Confusion matrix and classification accuracy: Give the confusion matrix

and classification accuracy on the validation set for each of the different
architectures, along with the best architecture selected after cross-validation.
Also, for the best architecture, give the confusion matrix and classification
accuracy on test data.

4) Plots of outputs for each of the hidden nodes and output nodes in FCNN for
each of the datasets after the model is trained. Here, x and y axis are input
variables of each example, z axis is output of hidden node/output node. Give the
plots for training, validation, and test data. (Give the plots for the best
architecture selected after cross-validation)
5) Comparison of performance with that of the single neuron model (Question-1)
for each dataset.

6) Inferences on the plots and inferences on the results observed.

Expectation of the question is to implement perceptron from scratch using
Python programming language.
Task 4: Multi-class classification using a Fully Connected
Neural Network on the MNIST Dataset
Objective: The objective of this programming assignment is to deepen your
understanding of the optimizers for backpropagation algorithms. The task is to
train the fully connected neural network (FCNN) using different optimizers for the
backpropagation algorithm and compare the number of epochs that it takes for
convergence along with their classification performance.

Dataset Description: You are given the subset of the MNIST digit dataset for
the same. Each group can choose any 5 classes. Divide the data into training
and testing in 8:2. Every image is of the size 28 x 28. Flatten each image to
represent it as a vector of 784-dimension (28 x 28).

The tasks for this assignment are as follows:

Develop an FCNN with different hidden layers (minimum number of hidden layers
is 3 and maximum is 5).

a) Use cross-entropy loss.

b) Experiment with different a number of nodes in each of the layers.

c) Train each of the architectures using:

i. Stochastic gradient descent (SGD) algorithm - (batch_size=1),

ii. Batch gradient descent algorithm (vanilla gradient descent) – (batch_size=total

number of training examples),

iii. SGD with momentum (generalized delta rule) – (batch_size=1),

iv. SGD with momentum (NAG) – (batch_size=1),

v. RMSProp – (batch_size=1), and

vi. Adam optimizer – (batch_size=1).

d) Use the same initial random values of weights for each architecture using
each of the optimizers.

e) Use the absolute difference between average error of successive epochs fall
below a threshold 10e-4 as stopping criteria. Do not use number of iterations as
stopping criteria.
f) Consider the learning rate (η) as 0.001 for all the optimizers.

g) Consider momentum parameter as 0.9 for both generalized delta and NAG.

h) Consider β = 0.99 and ε = 10 -8 for RMSProp.

i) Consider β1 = 0.9, β2 = 0.999 and ε = 10-8 for Adam optimizer.

Presentation of Results:
1. Observe the number of epochs considered for convergence for each of the
architectures. Tabulate and compare the number of epochs considered by each
of the optimizers for each architecture.2. Present the plots of average training
error (y-axis) vs. epochs (x-axis) for each architecture. Superimpose the plots
from each of the optimizers.

3. Give the training accuracy and validation accuracy for each of the optimizers in
each of the architectures.

4. Choose the best architecture based on validation accuracy. Give the test
confusion matrix and test classification accuracy along with training accuracy and
confusion matrix for the chosen best architecture.
You can use deep learning APIs (PyTorch).

Note:

● You are not supposed to use pre-built functions for models from libraries
like sklearn, and PyTorch except Task 4.
● Students will also be rewarded extra for solving bonus questions,
thorough experimentation and insightful analysis based on the results
and graphs they produce.

The report should be in PDF form, and the report by a team should also include the
observations about the results of studies.

Instruction:

Upload all your codes(.ipynb file) and reports(.pdf file) in a single zip file.

• Give the name of the folder as Group_Assignment2.

Example: Group01_Assignment2
• Give the name of the zip file as Group_Assignment2.zip Example:
Group01_Assignment2.zip

We will not accept the submission if you don’t follow the above instructions

DL - Assignment 1
No ratings yet
DL - Assignment 1
12 pages
Important Questions
No ratings yet
Important Questions
4 pages
ID6001 Homework
No ratings yet
ID6001 Homework
2 pages
CNN Training with W&B Confusion Matrix
No ratings yet
CNN Training with W&B Confusion Matrix
4 pages
Assignment 7
No ratings yet
Assignment 7
3 pages
Assignment 3 DS5620
No ratings yet
Assignment 3 DS5620
11 pages
Deep Learning
No ratings yet
Deep Learning
46 pages
DL Lab Manual - Ex 1 t0 15
No ratings yet
DL Lab Manual - Ex 1 t0 15
24 pages
DL Lab Manual
No ratings yet
DL Lab Manual
18 pages
Keras
No ratings yet
Keras
4 pages
Deep Learning
No ratings yet
Deep Learning
30 pages
DLWP Assignment 3
No ratings yet
DLWP Assignment 3
2 pages
CSET225 Lab Assignment 2 CIFAR10
No ratings yet
CSET225 Lab Assignment 2 CIFAR10
2 pages
DLP Lab
No ratings yet
DLP Lab
81 pages
CCS355 Neural Networks Assignment
No ratings yet
CCS355 Neural Networks Assignment
15 pages
PRML Lab01
No ratings yet
PRML Lab01
2 pages
Neural Network Assignment Guide
No ratings yet
Neural Network Assignment Guide
6 pages
Deep Learning: Image Classification & XOR
No ratings yet
Deep Learning: Image Classification & XOR
3 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
NNDL Record Manual
No ratings yet
NNDL Record Manual
36 pages
21CS10041 RCNN
No ratings yet
21CS10041 RCNN
6 pages
Deep Learning Assignment
No ratings yet
Deep Learning Assignment
11 pages
DL Lab Manual
No ratings yet
DL Lab Manual
29 pages
ML High Scorer Assignment: Basic Implementation of A Neural Network in Python
No ratings yet
ML High Scorer Assignment: Basic Implementation of A Neural Network in Python
2 pages
Project 1 - ANN With Backprop
No ratings yet
Project 1 - ANN With Backprop
3 pages
BCA Deep Learning Practical Guide
No ratings yet
BCA Deep Learning Practical Guide
17 pages
NN From Scratch PDF 1735495327
No ratings yet
NN From Scratch PDF 1735495327
19 pages
Neural Network Implementation Guide
No ratings yet
Neural Network Implementation Guide
12 pages
Neural Networks and Deep Learning Lab
No ratings yet
Neural Networks and Deep Learning Lab
6 pages
Assignment I-4
No ratings yet
Assignment I-4
3 pages
This Python Script Implements A Single
No ratings yet
This Python Script Implements A Single
6 pages
Assignment1 NN Scrach
No ratings yet
Assignment1 NN Scrach
3 pages
DL
No ratings yet
DL
12 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
10 pages
Project Documentation
No ratings yet
Project Documentation
24 pages
Project 2
No ratings yet
Project 2
2 pages
DL Record
No ratings yet
DL Record
37 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Exp 1 - Exp 2 - Exp 3 - Merged
No ratings yet
Exp 1 - Exp 2 - Exp 3 - Merged
9 pages
2024 GR5245 HW1 - Due0929 - 11pm
No ratings yet
2024 GR5245 HW1 - Due0929 - 11pm
2 pages
Lab 12
No ratings yet
Lab 12
6 pages
Classifying Hand-Written Digits Using Neural Network
No ratings yet
Classifying Hand-Written Digits Using Neural Network
21 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
DL Lab - Merged
No ratings yet
DL Lab - Merged
60 pages
Lab 1 Assignment - W2022
No ratings yet
Lab 1 Assignment - W2022
7 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
3 pages
Deep Learning Lab Practicals
No ratings yet
Deep Learning Lab Practicals
24 pages
Deep Learning Lab Manual for CSE
No ratings yet
Deep Learning Lab Manual for CSE
46 pages
Deep Learning Lab 2
No ratings yet
Deep Learning Lab 2
4 pages
Lab Sheet Artificial Intelligence: 1. Introduction To Machine Learning: Linear Regression
No ratings yet
Lab Sheet Artificial Intelligence: 1. Introduction To Machine Learning: Linear Regression
8 pages
Deep Neural Network for Image Classification
No ratings yet
Deep Neural Network for Image Classification
17 pages
Deep Learning
No ratings yet
Deep Learning
30 pages
Build & Train ANN and CNN Models
No ratings yet
Build & Train ANN and CNN Models
11 pages
Final DL
No ratings yet
Final DL
26 pages
Ex No:1 Implementing A Perceptron Algorithm For Binary Classification Date: Aim
No ratings yet
Ex No:1 Implementing A Perceptron Algorithm For Binary Classification Date: Aim
41 pages
DL Record
No ratings yet
DL Record
11 pages
ML NN 10pgm-1
No ratings yet
ML NN 10pgm-1
3 pages
CCS355-Neural Networks and Deep Learning - Assignment 1
No ratings yet
CCS355-Neural Networks and Deep Learning - Assignment 1
15 pages
Aluminum Standard and Data 2009 Metric Si
100% (3)
Aluminum Standard and Data 2009 Metric Si
240 pages
Christ Jeross Foundation AVP
No ratings yet
Christ Jeross Foundation AVP
9 pages
Ricci
No ratings yet
Ricci
10 pages
JAR A320 Neo Checklist-Rev8-1
No ratings yet
JAR A320 Neo Checklist-Rev8-1
1 page
Stats Book Class Xi
No ratings yet
Stats Book Class Xi
8 pages
Fathers and Daughters
No ratings yet
Fathers and Daughters
20 pages
Family Dynamics & Child Development
No ratings yet
Family Dynamics & Child Development
56 pages
Romanian Anthropology Education
No ratings yet
Romanian Anthropology Education
12 pages
Viscosity Explained for Scientists
No ratings yet
Viscosity Explained for Scientists
4 pages
1.AID ProjectProposal
No ratings yet
1.AID ProjectProposal
47 pages
Calculus Exercises and Solutions Guide
No ratings yet
Calculus Exercises and Solutions Guide
16 pages
KF6C Sec6 FRD
No ratings yet
KF6C Sec6 FRD
24 pages
Samsung SM3 Maintenance Guide
No ratings yet
Samsung SM3 Maintenance Guide
14 pages
EAP 11 - 12 - UNIT 7 - LESSON 2 - Critical Approaches in Analyzing A Critique
No ratings yet
EAP 11 - 12 - UNIT 7 - LESSON 2 - Critical Approaches in Analyzing A Critique
50 pages
The Oxford Handbook of The Indian Constitution Sujit Choudhry Instant Download
100% (1)
The Oxford Handbook of The Indian Constitution Sujit Choudhry Instant Download
84 pages
Electrical and Instrumentation Engineering For Oil and Gas Facilities
No ratings yet
Electrical and Instrumentation Engineering For Oil and Gas Facilities
19 pages
J Pharm Sci - 2007 - Descamps - Transformation of Pharmaceutical Compounds Upon Milling and Comilling The Role of TG
No ratings yet
J Pharm Sci - 2007 - Descamps - Transformation of Pharmaceutical Compounds Upon Milling and Comilling The Role of TG
10 pages
Scilympics 2013-14: Student Science Contest
100% (2)
Scilympics 2013-14: Student Science Contest
2 pages
Values and Human Rights - Module
100% (1)
Values and Human Rights - Module
7 pages
TYBCOM Sem V Lect 1
No ratings yet
TYBCOM Sem V Lect 1
9 pages
Qur'an Parsing and Word Analysis
No ratings yet
Qur'an Parsing and Word Analysis
12 pages
03.01.04-Project Progress Reporting Contractor-Guideline
100% (1)
03.01.04-Project Progress Reporting Contractor-Guideline
46 pages
KCSE History Paper 1 1999 Questions
No ratings yet
KCSE History Paper 1 1999 Questions
2 pages
PAPs-for-SIP-2025-2028 - Root Cause - MATH-SUMMATION
No ratings yet
PAPs-for-SIP-2025-2028 - Root Cause - MATH-SUMMATION
2 pages
8th International Conference Goa (CRISEA-2025)
No ratings yet
8th International Conference Goa (CRISEA-2025)
6 pages
Chinese Cabinet Cultural Mapping
No ratings yet
Chinese Cabinet Cultural Mapping
3 pages
Child & Adolescent Learning Guide
No ratings yet
Child & Adolescent Learning Guide
49 pages
Magma Formation
No ratings yet
Magma Formation
12 pages
Project
No ratings yet
Project
3 pages
March 2021
No ratings yet
March 2021
2 pages

LLM Assignment 2

Uploaded by

LLM Assignment 2

Uploaded by

Course: Large Language Model

Task 1: Optimizer Performance on Non-Convex Functions

Details of the Architecture:

Expectation of the question is to implement a solution from scratch using

3) Confusion matrix and classification accuracy: Give the confusion matrix

6) Inferences on the plots and inferences on the results observed.

The tasks for this assignment are as follows:

a) Use cross-entropy loss.

b) Experiment with different a number of nodes in each of the layers.

c) Train each of the architectures using:

i. Stochastic gradient descent (SGD) algorithm - (batch_size=1),

ii. Batch gradient descent algorithm (vanilla gradient descent) – (batch_size=total

iii. SGD with momentum (generalized delta rule) – (batch_size=1),

iv. SGD with momentum (NAG) – (batch_size=1),

v. RMSProp – (batch_size=1), and

vi. Adam optimizer – (batch_size=1).

h) Consider β = 0.99 and ε = 10 -8 for RMSProp.

i) Consider β1 = 0.9, β2 = 0.999 and ε = 10-8 for Adam optimizer.

• Give the name of the folder as Group_Assignment2.

You might also like