-
Run the following command line in the terminal:
pip install pandas numpy matplotlib tensorflow scikit-learn -
Download the dataset (link to the dataset: https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)
-
Copy the path of the
IMDb Dataset.csvfile and paste it in the code -
Download the GloVe embeddings from the official website: https://nlp.stanford.edu/projects/glove/ > Download the glove.6B.zip file > Extract the
glove.6B.100d.txtfile from the zip folder -
Copy and paste the directory of this
.txtfile into the code -
After running all the cells of the code, in the last cell you can enter a review of your own choice and run the cell
-
The answer to the review will be provided by the cell with an accuracy of more than 80%
This project performs sentiment analysis on IMDb movie reviews using an LSTM-based model. The model classifies reviews as either positive or negative based on the text content.
-
Data Preprocessing: The IMDb dataset is loaded, and text data is tokenized and converted into sequences of integers. These sequences are then padded to ensure uniform input length. The sentiment labels are mapped to binary values (1 for positive, 0 for negative).
-
GloVe Embeddings: Pre-trained GloVe word embeddings are used to convert words into dense vector representations. These embeddings are mapped to words in the dataset and stored in an embedding matrix.
-
Model Architecture: A Sequential LSTM model is created with an embedding layer (using the GloVe embeddings), an LSTM layer, and a Dense output layer with a sigmoid activation for binary classification. The model is compiled with Adam optimizer and binary cross-entropy loss.
-
Model Training: The model is trained for 5 epochs on the preprocessed training data, and training/validation accuracy and loss are visualized.
-
Evaluation: The model is evaluated on the test dataset, with metrics such as accuracy, precision, recall, and F1-score calculated. The model achieves an accuracy of 85.39%.
-
Sentiment Prediction: A function is implemented to predict the sentiment of new movie reviews. Three sample reviews are classified as either "positive" or "negative" based on the model's prediction.
This project demonstrates a robust approach to sentiment analysis using deep learning, specifically LSTM with word embeddings for better understanding of text sentiment.


