0% found this document useful (0 votes)
24 views5 pages

Sentiment Analysis Project Report

The sentiment analysis project utilized DistilBERT to classify tweets into positive, neutral, and negative categories. The methodology included text preprocessing, tokenization, and model training, resulting in an accuracy of 85%. Future enhancements and dataset augmentation are suggested for improved performance in real-world applications.

Uploaded by

AMR 66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views5 pages

Sentiment Analysis Project Report

The sentiment analysis project utilized DistilBERT to classify tweets into positive, neutral, and negative categories. The methodology included text preprocessing, tokenization, and model training, resulting in an accuracy of 85%. Future enhancements and dataset augmentation are suggested for improved performance in real-world applications.

Uploaded by

AMR 66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Sentiment Analysis Project Report

Amr Khaled 21100834 Amira Ali 21100789


1. Introduction:

analysis is the process of analyzing text data to determine the sentiment or


emotion it conveys, such as positive, negative, or neutral. This project uses
transformer-based architecture, specifically Distil BERT, to classify tweets
into three categories: positive, neutral, and negative.

2. Dataset Details:
Columns:

• id: Unique identifier for each tweet.

• label: Sentiment label (0 for neutral, 1 for positive, -1 for negative).

• tweet: The text of the tweet.


3. Methodology

3.1. Text Preprocessing

• Removed URLs, HTML tags, and special characters using regular expressions.

• Stripped extra whitespace.

• Ensured all text was converted to lowercase.

3.2. Tokenization

• Used the DistilBertTokenizer to tokenize the text data, ensuring padding and
truncation to a fixed length of 128 tokens.
3.3. Data Splitting

• Split the dataset into training (70%) and testing (30%) sets using train_test_split().

• Encoded labels into numeric values using LabelEncoder.

3.4. Model Selection

• Selected DistilBERT (distilbert-base-uncased) for its efficiency and accuracy in text


classification tasks.

• Added a classification head with three output neurons for multi-class classification.
3.5. Training

• Optimizer: AdamW with a learning rate of 2e-5.

• Batch Size: 16.

• Epochs: 3.

• Used the training dataset to fine-tune the pre-trained Distil BERT model.

‫ؤم‬
Conclusion :
This project successfully built a sentiment analysis model using DistilBERT,
achieving an accuracy of 85%. With further improvements and dataset
augmentation, the model’s performance can be enhanced for real-world
applications.

You might also like