Skip to content

ElDEEB21/FER-2013-CNN-ResNet

Repository files navigation

FER-2013 Facial Emotion Recognition: CNN vs Transfer Learning

📋 Project Overview

This project presents a comprehensive comparison of two different approaches for facial emotion recognition using the FER-2013 dataset. We implement and evaluate both a custom Convolutional Neural Network (CNN) from scratch and a transfer learning approach using ResNet50V2 to classify facial expressions into seven emotion categories.

🎯 Objectives

  • Build and compare two distinct deep learning models for emotion recognition
  • Analyze the performance differences between custom CNN and transfer learning approaches
  • Demonstrate the practical implications of each methodology
  • Provide insights into when to use each approach for emotion detection tasks

📊 Dataset Information

FER-2013 (Facial Expression Recognition 2013)

  • Total Images: ~35,000 grayscale facial images
  • Image Size: 48x48 pixels (original dataset) / 224x224 (for ResNet transfer learning)
  • Classes: 7 emotions
    • 😠 Angry
    • 🤢 Disgust
    • 😨 Fear
    • 😊 Happy
    • 😢 Sad
    • 😮 Surprise
    • 😐 Neutral

Dataset Characteristics:

  • Imbalanced class distribution (Happy and Neutral are more frequent)
  • Real-world, challenging facial expressions
  • Varying lighting conditions and image quality
  • Mix of different demographics and ages

🏗️ Project Structure

FER-2013-CNN-ResNet/
├── README.md                                    # This comprehensive guide
├── fer2013-face-emotion-detection.ipynb        # Custom CNN implementation
├── face-emotion-detection-custom-resnet50v2.ipynb  # Transfer learning implementation
├── FER-2013-Face-Emotion-Detection-with-CNN.pdf      # Small presentation for Custom CNN model
└── FER-2013-Face-Emotion-Detection-with-Custom-ResNet50V2.pdf  # Small presentation for Transfer Learning model

📄 Additional Documentation

  • PDF Presentations: Each model has an accompanying PDF presentation that provides a concise overview of the methodology, results, and key findings for both the Custom CNN and Transfer Learning approaches.

🔬 Model Implementations

1. Custom CNN Architecture (fer2013-face-emotion-detection.ipynb)

Model Design

Sequential Model:
├── Conv2D(32) + BatchNorm + ReLU
├── Conv2D(64) + BatchNorm + ReLU + MaxPool + Dropout(0.25)
├── Conv2D(128) + BatchNorm + ReLU
├── Conv2D(128) + BatchNorm + ReLU + MaxPool + Dropout(0.25)  
├── Conv2D(256) + BatchNorm + ReLU
├── Conv2D(256) + BatchNorm + ReLU + MaxPool + Dropout(0.25)
├── Flatten
├── Dense(256) + BatchNorm + ReLU + Dropout(0.5)
└── Dense(7) + Softmax

Key Features

  • Input Size: 48x48 grayscale images
  • Total Parameters: ~1.2M trainable parameters
  • Architecture: Deep CNN with progressive feature extraction
  • Regularization: Batch normalization, dropout, data augmentation
  • Optimization: Adam optimizer with learning rate scheduling

Training Configuration

  • Data Augmentation: Rotation, shift, shear, zoom, horizontal flip
  • Class Weights: Balanced to handle class imbalance
  • Callbacks: ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
  • Epochs: Up to 100 with early stopping (patience=10)

2. Transfer Learning with ResNet50V2 (face-emotion-detection-custom-resnet50v2.ipynb)

Model Design

Transfer Learning Architecture:
├── ResNet50V2 (ImageNet pretrained, last 50 layers trainable)
├── Dropout(0.25)
├── BatchNormalization
├── Flatten
├── Dense(64) + ReLU + BatchNorm + Dropout(0.5)
└── Dense(7) + Softmax

Key Features

  • Input Size: 224x224 RGB images (upscaled from grayscale)
  • Base Model: ResNet50V2 pretrained on ImageNet
  • Fine-tuning: Last 50 layers trainable, earlier layers frozen
  • Transfer Strategy: Feature extraction + fine-tuning

Training Configuration

  • Training Stage: End-to-end training with adaptive learning rate
  • Data Augmentation: Rotation, zoom, shift, horizontal flip
  • Advanced Callbacks: Comprehensive monitoring and model checkpointing
  • Epochs: 30 epochs with adaptive learning rate

📈 Performance Comparison

Model Results Summary

Metric Custom CNN Transfer Learning (ResNet50V2)
Training Accuracy 55.75% 75.74%
Validation Accuracy 61.45% 68.79%
Test Accuracy 57.80% 68.79%
Training Loss 1.1102 ~0.65
Test Loss 1.1423 ~0.90
Training Time ~2-3 hours ~1-2 hours
Model Size ~1.2M parameters ~25M+ parameters

🏆 Performance Analysis

Transfer Learning Advantages:

  1. Higher Accuracy: 11% improvement in test accuracy (57.80% → 68.79%)
  2. Better Generalization: Smaller gap between training and validation accuracy
  3. Faster Convergence: Leverages pre-trained features for quicker learning
  4. Robust Features: ImageNet features provide strong foundation for face recognition

Custom CNN Advantages:

  1. Computational Efficiency: Significantly smaller model size
  2. Resource Requirements: Lower memory and storage requirements
  3. Domain Specificity: Architecture designed specifically for the task
  4. Training Control: Complete control over feature learning process

🔍 Detailed Analysis

When to Use Transfer Learning

✅ Recommended When:

  • Limited Data: Small to medium-sized datasets benefit from pre-trained features
  • High Accuracy Required: Need maximum performance for production systems
  • Time Constraints: Faster development and training cycles
  • Similar Domains: Task relates to ImageNet categories (objects, scenes, faces)
  • Resources Available: Sufficient computational resources for larger models

❌ Consider Alternatives When:

  • Edge Deployment: Strict memory/storage constraints
  • Real-time Processing: Low-latency requirements
  • Very Different Domains: Task significantly different from ImageNet
  • Training from Scratch: Abundant data and computational resources

When to Use Custom CNN

✅ Recommended When:

  • Resource Constraints: Limited memory, storage, or computational power
  • Edge Computing: Mobile or embedded applications
  • Specialized Tasks: Highly domain-specific requirements
  • Educational Purposes: Learning CNN architectures and principles
  • Unique Data: Significantly different from common computer vision tasks

❌ Consider Alternatives When:

  • Performance Critical: Accuracy is the primary concern
  • Limited Data: Small datasets that benefit from pre-trained features
  • Development Speed: Quick prototyping and deployment needed

🛠️ Technical Implementation Details

Data Preprocessing Pipeline

Custom CNN:

# Grayscale 48x48 processing
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    validation_split=0.2
)

Transfer Learning:

# RGB 224x224 processing
train_preprocessor = ImageDataGenerator(
    rescale=1/255.,
    rotation_range=10,
    zoom_range=0.2,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest'
)

Key Training Strategies

  1. Class Imbalance Handling:

    • Custom CNN: Computed class weights with sklearn
    • Transfer Learning: Balanced sampling and augmentation
  2. Regularization Techniques:

    • Dropout layers (0.25-0.5)
    • Batch normalization
    • Data augmentation
    • Early stopping
  3. Learning Rate Strategies:

    • Custom CNN: Adaptive learning rate with ReduceLROnPlateau
    • Transfer Learning: Two-stage training with different learning rates

🚀 Getting Started

Prerequisites

pip install tensorflow>=2.8.0
pip install opencv-python
pip install matplotlib
pip install seaborn
pip install scikit-learn
pip install pandas
pip install numpy

Running the Notebooks

Local Environment:

  1. Custom CNN Approach:

    jupyter notebook fer2013-face-emotion-detection.ipynb
  2. Transfer Learning Approach:

    jupyter notebook face-emotion-detection-custom-resnet50v2.ipynb

Kaggle (Online - Recommended):

  1. Custom CNN Approach: Face Emotion Detection - Custom CNN
  2. Transfer Learning Approach: Face Emotion Detection - Custom ResNet50V2

💡 Tip: The Kaggle versions are ready to run with the dataset already configured and free GPU access!

Dataset Setup

  • Download FER-2013 dataset from Kaggle
  • Organize in the following structure:
    fer2013/
    ├── train/
    │   ├── angry/
    │   ├── disgust/
    │   ├── fear/
    │   ├── happy/
    │   ├── neutral/
    │   ├── sad/
    │   └── surprise/
    └── test/
        ├── angry/
        ├── disgust/
        ├── fear/
        ├── happy/
        ├── neutral/
        ├── sad/
        └── surprise/
    

📊 Results Visualization

Both notebooks include comprehensive visualizations:

  • Training Progress: Loss and accuracy curves
  • Class Distribution: Dataset balance analysis
  • Confusion Matrices: Per-class performance analysis
  • Sample Predictions: Visual validation of model outputs
  • Feature Analysis: Understanding learned representations

🎯 Key Insights and Conclusions

1. Performance Trade-offs

  • Transfer Learning: 19% relative improvement in accuracy but 20x more parameters
  • Custom CNN: More efficient but requires careful architecture design and longer training

2. Practical Considerations

  • Production Systems: Transfer learning for maximum accuracy
  • Mobile/Edge: Custom CNN for efficiency
  • Research: Both approaches provide valuable insights

3. Future Improvements

  • Ensemble Methods: Combine both models for better performance
  • Advanced Architectures: Experiment with Vision Transformers
  • Data Augmentation: Advanced techniques like CutMix, MixUp
  • Attention Mechanisms: Focus on facial regions of interest

📚 References and Resources

  1. Dataset: FER-2013 Facial Expression Recognition Challenge
  2. Kaggle Notebooks:
  3. Transfer Learning: Deep Residual Learning for Image Recognition
  4. Computer Vision: Deep Learning for Computer Vision
  5. TensorFlow Documentation: Transfer Learning Guide

🤝 Contributing

Feel free to contribute to this project by:

  • Implementing additional architectures (EfficientNet, Vision Transformers)
  • Experimenting with different preprocessing techniques
  • Adding real-time emotion detection capabilities
  • Improving documentation and examples

Author: Abdulrahman Eldeeb
LinkedIn: Abd El-Rahman Eldeeb
Course: Computer Vision
Focus: Deep Learning for Emotion Recognition
Date: 2025

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors