This project presents a comprehensive comparison of two different approaches for facial emotion recognition using the FER-2013 dataset. We implement and evaluate both a custom Convolutional Neural Network (CNN) from scratch and a transfer learning approach using ResNet50V2 to classify facial expressions into seven emotion categories.
- Build and compare two distinct deep learning models for emotion recognition
- Analyze the performance differences between custom CNN and transfer learning approaches
- Demonstrate the practical implications of each methodology
- Provide insights into when to use each approach for emotion detection tasks
FER-2013 (Facial Expression Recognition 2013)
- Total Images: ~35,000 grayscale facial images
- Image Size: 48x48 pixels (original dataset) / 224x224 (for ResNet transfer learning)
- Classes: 7 emotions
- 😠 Angry
- 🤢 Disgust
- 😨 Fear
- 😊 Happy
- 😢 Sad
- 😮 Surprise
- 😐 Neutral
Dataset Characteristics:
- Imbalanced class distribution (Happy and Neutral are more frequent)
- Real-world, challenging facial expressions
- Varying lighting conditions and image quality
- Mix of different demographics and ages
FER-2013-CNN-ResNet/
├── README.md # This comprehensive guide
├── fer2013-face-emotion-detection.ipynb # Custom CNN implementation
├── face-emotion-detection-custom-resnet50v2.ipynb # Transfer learning implementation
├── FER-2013-Face-Emotion-Detection-with-CNN.pdf # Small presentation for Custom CNN model
└── FER-2013-Face-Emotion-Detection-with-Custom-ResNet50V2.pdf # Small presentation for Transfer Learning model
- PDF Presentations: Each model has an accompanying PDF presentation that provides a concise overview of the methodology, results, and key findings for both the Custom CNN and Transfer Learning approaches.
Sequential Model:
├── Conv2D(32) + BatchNorm + ReLU
├── Conv2D(64) + BatchNorm + ReLU + MaxPool + Dropout(0.25)
├── Conv2D(128) + BatchNorm + ReLU
├── Conv2D(128) + BatchNorm + ReLU + MaxPool + Dropout(0.25)
├── Conv2D(256) + BatchNorm + ReLU
├── Conv2D(256) + BatchNorm + ReLU + MaxPool + Dropout(0.25)
├── Flatten
├── Dense(256) + BatchNorm + ReLU + Dropout(0.5)
└── Dense(7) + Softmax
- Input Size: 48x48 grayscale images
- Total Parameters: ~1.2M trainable parameters
- Architecture: Deep CNN with progressive feature extraction
- Regularization: Batch normalization, dropout, data augmentation
- Optimization: Adam optimizer with learning rate scheduling
- Data Augmentation: Rotation, shift, shear, zoom, horizontal flip
- Class Weights: Balanced to handle class imbalance
- Callbacks: ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
- Epochs: Up to 100 with early stopping (patience=10)
Transfer Learning Architecture:
├── ResNet50V2 (ImageNet pretrained, last 50 layers trainable)
├── Dropout(0.25)
├── BatchNormalization
├── Flatten
├── Dense(64) + ReLU + BatchNorm + Dropout(0.5)
└── Dense(7) + Softmax
- Input Size: 224x224 RGB images (upscaled from grayscale)
- Base Model: ResNet50V2 pretrained on ImageNet
- Fine-tuning: Last 50 layers trainable, earlier layers frozen
- Transfer Strategy: Feature extraction + fine-tuning
- Training Stage: End-to-end training with adaptive learning rate
- Data Augmentation: Rotation, zoom, shift, horizontal flip
- Advanced Callbacks: Comprehensive monitoring and model checkpointing
- Epochs: 30 epochs with adaptive learning rate
| Metric | Custom CNN | Transfer Learning (ResNet50V2) |
|---|---|---|
| Training Accuracy | 55.75% | 75.74% |
| Validation Accuracy | 61.45% | 68.79% |
| Test Accuracy | 57.80% | 68.79% |
| Training Loss | 1.1102 | ~0.65 |
| Test Loss | 1.1423 | ~0.90 |
| Training Time | ~2-3 hours | ~1-2 hours |
| Model Size | ~1.2M parameters | ~25M+ parameters |
- Higher Accuracy: 11% improvement in test accuracy (57.80% → 68.79%)
- Better Generalization: Smaller gap between training and validation accuracy
- Faster Convergence: Leverages pre-trained features for quicker learning
- Robust Features: ImageNet features provide strong foundation for face recognition
- Computational Efficiency: Significantly smaller model size
- Resource Requirements: Lower memory and storage requirements
- Domain Specificity: Architecture designed specifically for the task
- Training Control: Complete control over feature learning process
✅ Recommended When:
- Limited Data: Small to medium-sized datasets benefit from pre-trained features
- High Accuracy Required: Need maximum performance for production systems
- Time Constraints: Faster development and training cycles
- Similar Domains: Task relates to ImageNet categories (objects, scenes, faces)
- Resources Available: Sufficient computational resources for larger models
❌ Consider Alternatives When:
- Edge Deployment: Strict memory/storage constraints
- Real-time Processing: Low-latency requirements
- Very Different Domains: Task significantly different from ImageNet
- Training from Scratch: Abundant data and computational resources
✅ Recommended When:
- Resource Constraints: Limited memory, storage, or computational power
- Edge Computing: Mobile or embedded applications
- Specialized Tasks: Highly domain-specific requirements
- Educational Purposes: Learning CNN architectures and principles
- Unique Data: Significantly different from common computer vision tasks
❌ Consider Alternatives When:
- Performance Critical: Accuracy is the primary concern
- Limited Data: Small datasets that benefit from pre-trained features
- Development Speed: Quick prototyping and deployment needed
# Grayscale 48x48 processing
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
validation_split=0.2
)# RGB 224x224 processing
train_preprocessor = ImageDataGenerator(
rescale=1/255.,
rotation_range=10,
zoom_range=0.2,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
fill_mode='nearest'
)-
Class Imbalance Handling:
- Custom CNN: Computed class weights with sklearn
- Transfer Learning: Balanced sampling and augmentation
-
Regularization Techniques:
- Dropout layers (0.25-0.5)
- Batch normalization
- Data augmentation
- Early stopping
-
Learning Rate Strategies:
- Custom CNN: Adaptive learning rate with ReduceLROnPlateau
- Transfer Learning: Two-stage training with different learning rates
pip install tensorflow>=2.8.0
pip install opencv-python
pip install matplotlib
pip install seaborn
pip install scikit-learn
pip install pandas
pip install numpy-
Custom CNN Approach:
jupyter notebook fer2013-face-emotion-detection.ipynb
-
Transfer Learning Approach:
jupyter notebook face-emotion-detection-custom-resnet50v2.ipynb
- Custom CNN Approach: Face Emotion Detection - Custom CNN
- Transfer Learning Approach: Face Emotion Detection - Custom ResNet50V2
💡 Tip: The Kaggle versions are ready to run with the dataset already configured and free GPU access!
- Download FER-2013 dataset from Kaggle
- Organize in the following structure:
fer2013/ ├── train/ │ ├── angry/ │ ├── disgust/ │ ├── fear/ │ ├── happy/ │ ├── neutral/ │ ├── sad/ │ └── surprise/ └── test/ ├── angry/ ├── disgust/ ├── fear/ ├── happy/ ├── neutral/ ├── sad/ └── surprise/
Both notebooks include comprehensive visualizations:
- Training Progress: Loss and accuracy curves
- Class Distribution: Dataset balance analysis
- Confusion Matrices: Per-class performance analysis
- Sample Predictions: Visual validation of model outputs
- Feature Analysis: Understanding learned representations
- Transfer Learning: 19% relative improvement in accuracy but 20x more parameters
- Custom CNN: More efficient but requires careful architecture design and longer training
- Production Systems: Transfer learning for maximum accuracy
- Mobile/Edge: Custom CNN for efficiency
- Research: Both approaches provide valuable insights
- Ensemble Methods: Combine both models for better performance
- Advanced Architectures: Experiment with Vision Transformers
- Data Augmentation: Advanced techniques like CutMix, MixUp
- Attention Mechanisms: Focus on facial regions of interest
- Dataset: FER-2013 Facial Expression Recognition Challenge
- Kaggle Notebooks:
- Transfer Learning: Deep Residual Learning for Image Recognition
- Computer Vision: Deep Learning for Computer Vision
- TensorFlow Documentation: Transfer Learning Guide
Feel free to contribute to this project by:
- Implementing additional architectures (EfficientNet, Vision Transformers)
- Experimenting with different preprocessing techniques
- Adding real-time emotion detection capabilities
- Improving documentation and examples
Author: Abdulrahman Eldeeb
LinkedIn: Abd El-Rahman Eldeeb
Course: Computer Vision
Focus: Deep Learning for Emotion Recognition
Date: 2025