Sentiment Analysis
Project Based Learning (PBL) Report
for the course
Statistics for Machine Learning – 20MA32L01
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
By
23R15A0525- S.Akshara
23R15A0524- R.Navaneetha
23R15A0523-P.Envitha
Under the guidance of
Dr. A. Srinivasulu
Department of Computer Science and Engineering
Accredited by NBA
Geethanjali College of Engineering and Technology
(UGC Autonomous)
(Affiliated to J.N.T.U.H, Approved by AICTE, New Delhi)
Cheeryal (V), Keesara (M), Medchal.Dist.-501 301.
JUNE-2025
TABLE OF CONTENTS
S.No. Contents Page No
1 ACKNOWLEDGEMENT 1
2 ABSTRACT 2
3 INTRODUCTION 3
4 SYSTEM DESIGN 7
5 IMPLEMENTATION 8
6 SAMPLE CODE 9
7 OUTPUT SCREENS 11
8 CONCLUSION 13
9 REFERENCES 14
ACKNOWLEDGEMENT
We would like to acknowledge and give my warmest thanks to our faculty Dr. A. srinivasulu
sir who made this work possible. Their guidance and advice carried us through all the stages
of writing our project. We would also like to thank our classmates for letting our defence
be an enjoyable moment, and for your brilliant comments and suggestions, thanks to you.
We would also like to give special thanks to our families as a whole for their continuous
support and understanding when undertaking my research and writing my project and
providing the required equipment. The project would not have been successful without their
Cooperation and inputs.
1
ABSTRACT
This project presents a foundational approach to sentiment analysis using deep learning with
TensorFlow and Keras. Sentiment analysis is a key task in Natural Language Processing (NLP) that
involves determining the emotional tone behind text data. The goal of this project is to classify short
sentences into positive or negative sentiments. A small custom dataset is created with labeled
sentences expressing either positive or negative emotions. The preprocessing phase includes
tokenizing the text using Keras’ Tokenizer and padding the sequences to ensure uniform input
lengths for the neural network.
The model is a simple sequential neural network that processes the tokenized text and learns to
identify sentiment patterns. It is trained using binary classification techniques to predict whether a
given sentence has a positive (label 1) or negative (label 0) sentiment. Through this implementation,
the project demonstrates key steps such as data preparation, text vectorization, neural network
construction, and training for binary sentiment classification.
Although this is a basic implementation, it effectively introduces important concepts in text
classification and serves as a practical guide for beginners in machine learning and NLP. The project
can be further extended by incorporating a larger dataset, using word embeddings, or applying more
complex deep learning architectures like LSTM or GRU.
2
INTRODUCTION
About the project
In today's digital era, vast amounts of textual data are generated daily through social media, product
reviews, forums, and other online platforms. Analyzing the sentiment behind this text helps
businesses, researchers, and developers understand public opinion, improve customer experiences,
and make informed decisions. Sentiment analysis, a subfield of Natural Language Processing (NLP),
involves determining whether a piece of text expresses a positive, negative, or neutral sentiment.
This project aims to build a basic sentiment analysis model using Python and deep learning libraries
such as TensorFlow and Keras. The model classifies short sentences into binary categories: positive
or negative sentiment. A small, manually created dataset is used to train the model, with each
sentence labeled accordingly. The workflow includes text preprocessing through tokenization and
sequence padding, followed by the development and training of a neural network model.
The purpose of this project is to provide a practical and educational implementation of sentiment
analysis suitable for beginners. It introduces core concepts such as text-to-sequence conversion,
neural network design, and binary classification. While the model is simple, it lays the foundation
for more advanced techniques in NLP. The project can be enhanced by using larger datasets, pre-
trained word embeddings, or more sophisticated models like LSTMs or Transformers.
Project outcomes and objectives
1. Understand Sentiment Analysis Concepts
To gain a clear understanding of what sentiment analysis is and how it is used in real-world
applications.
3
2. Implement Text Preprocessing Techniques
To learn and apply preprocessing steps such as tokenization and padding, preparing raw
text for deep learning models.
3. Develop a Binary Sentiment Classification Model
To build a neural network using TensorFlow and Keras that can classify text into positive
or negative sentiments.
4. Train and Evaluate the Model
To train the model on a small labeled dataset and assess its performance using appropriate
metrics.
5. Provide a Simple Educational NLP Solution
To create a beginner-friendly implementation that demonstrates the basic steps in building
a text classification model using deep learning.
Project Outcomes
A working sentiment analysis model capable of predicting positive or negative sentiment
from short text inputs.
A clear understanding of how to preprocess textual data for machine learning.
Practical experience in building and training neural networks using Keras.
A foundation for more advanced NLP tasks such as multi-class sentiment analysis, emotion
detection, or model deployment.
A Jupyter Notebook that serves as a learning tool for future NLP or machine learning
projects.
4
Key Features
1. Text Preprocessing Module
Function:
Prepares raw textual data for analysis by converting text into a numerical format usable by machine
learning models.
Key Features:
Tokenization: Converts words into integer sequences using Keras Tokenizer.
Padding: Ensures consistent input lengths using pad_sequences.
Vocabulary Generation: Builds a word index to maintain consistent encoding.
2. Sentiment Classification Model Module
Function:
Trains a deep learning model to classify text inputs as positive or negative sentiments.
Key Features:
Neural Network Structure: Uses Embedding, Flatten, and Dense layers for classification.
Training with Labels: Learns sentiment patterns from labeled training data.
Binary Output: Outputs a probability between 0 and 1 (positive vs. negative sentiment).
3. Inference and Prediction Module
Function:
Uses the trained model to predict sentiment for new, unseen text inputs.
5
Key Features:
Dynamic Input Handling: Accepts new text, tokenizes, and pads based on the original
tokenizer.
Probability Output: Returns the model’s confidence score for positive or negative
sentiment.
Decision Thresholding: Applies a threshold (e.g., > 0.5 is positive) to decide final output.
4. Decision-Making Module
Function:
Interprets model outputs to make final sentiment decisions and guide responses.
Key Features:
Threshold-Based Classification: Converts prediction probabilities into labels.
Confidence Assessment: Optionally outputs prediction certainty to the user.
Rule-Based Actions: Could trigger different responses or actions based on sentiment (e.g.,
alert on negative sentiment).
5. User Interface (Optional/Future Module)
Function:
Provides a user-friendly interface for inputting text and viewing sentiment results.
Key Features:
Input Box for Text: Allows users to enter custom sentences.
Prediction Display: Shows whether the sentiment is positive or negative along with
confidence.
6
SYSTEM DESIGN
Software Requirements
Operating System : Microsoft Windows
Software Name : Jupyter Notebook
Type : IDE
Developers : Fernando Pérez
Hardware Requirements
Device name : DESKTOP-0OGA6I1
Processor : AMD Ryzen 3 3250U with Radeon Graphics 2.60 GHz
Installed RAM : 8.00 GB (5.94 GB usable)
Device ID : 76DDCDEE-6C4D-43DB-99D9-4E23080623F7
System type : 64-bit operating system, x64-based processor
7
IMPLEMENTATION
Modules Implementation
Text Preprocessing Module
# Define input sentences and labels
sentences = [
"I love this product",
"This is the best thing ever",
"Absolutely fantastic experience",
"I hate this",
"This is the worst",
"Terrible and disappointing"
]
labels = [1, 1, 1, 0, 0, 0]
# Tokenization
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer()
tokenizer.fit_on_texts(sentences)
sequences = tokenizer.texts_to_sequences(sentences)
# Padding
padded_sequences = pad_sequences(sequences, padding='post')
Sentiment Classification Model Module
# Define features and labels
X = padded_sequences
y = labels
# Build model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Flatten, Dense
model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index)+1, output_dim=8,
input_length=X.shape[1]))
model.add(Flatten())
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile and train model
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(X, y, epochs=10)
8
Inference and Prediction Module
# Predict sentiment of new text
new_text = ["I really enjoy this"]
new_seq = tokenizer.texts_to_sequences(new_text)
new_pad = pad_sequences(new_seq, maxlen=X.shape[1], padding='post')
prediction = model.predict(new_pad)
print("Positive" if prediction[0][0] > 0.5 else "Negative")
Sample Code
import numpy as np
texts = ["I love programming",
"Python is awesome",
"I hate bugs",
"Debugging is fun",
"I love solving problems",
"I don't like errors"]
labels = [1, 1, 0, 1, 1, 0]
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
sequences
texts
['I love programming',
'Python is awesome',
'I hate bugs',
'Debugging is fun',
'I love solving problems',
"I don't like errors"]
9
max_length = max([len(sequence) for sequence in sequences])
max_length
X = pad_sequences(sequences, maxlen=max_length, padding='post')
X
y = np.array(labels)
y
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Dense, Flatten
model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index) + 1,
output_dim=8,
input_length=max_length))
model.add(Flatten())
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=20, batch_size=2)
sample_text = "i love programming"
sample_sequence = tokenizer.texts_to_sequences([sample_text]) # Tokenize the sample text
sample_padded = pad_sequences(sample_sequence, maxlen=max_length, padding='post') # Pad
the sequence
prediction = model.predict(sample_padded)
if prediction > 0.5:
print('positive')
else:
print('negative')
print(prediction[0][0])
10
Sample Output
11
12
CONCLUSION
This sentiment analysis project successfully demonstrates how natural language processing (NLP)
and deep learning techniques can be applied to classify text data into positive or negative sentiments.
Through a structured and modular approach, the project walks through the essential stages of
preprocessing raw text using tokenization and padding, building and training a neural network model
with Keras, and making predictions on new text inputs.
Despite using a small custom dataset for simplicity, the project effectively highlights the workflow
of a typical machine learning pipeline—from data preparation to inference. The use of embedding
layers enables the model to understand word relationships, while the binary classification output
provides an interpretable result.
This beginner-friendly project serves as a strong foundation for more complex NLP tasks. It can be
extended further by incorporating larger and real-world datasets, more advanced model architectures
like LSTM or BERT, and deploying the model through a user interface using frameworks such as
Flask or Streamlit.
Overall, the project not only accomplishes its goal of performing basic sentiment analysis but also
offers valuable insights into the end-to-end development of an AI-based text classification system.
13
REFERENCES
https://www.eneuro.org https://www.geeksforgeeks.org/python-programming-
language/ https://stackoverflow.com/
https://ailocallist.com
14