Densely Connected Convolutional Networks (DenseNet)
CVPR 2017
Presented by: Joshua Juste NIKIEMA
Original Authors: Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger
August 4, 2025
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
1 / 16
Outline
1 Abstract
2 Architecture
3 Tasks and Applications
4 Baselines and Comparisons
5 Metrics, Results, and Datasets
6 Future Work
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
2 / 16
Abstract
Problem: Traditional CNNs suffer from vanishing gradient problem
as they get deeper
Key Insight: Networks can be substantially deeper, more accurate,
and efficient with shorter connections between layers
Solution: DenseNet connects each layer to every other layer in a
feed-forward fashion
Connections: Traditional L-layer networks have L connections,
DenseNet has L(L+1)
2 connections
Benefits:
Alleviates vanishing-gradient problem
Strengthens feature propagation
Encourages feature reuse
Substantially reduces parameters
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
3 / 16
DenseNet Architecture - Core Concept
Dense Connectivity Pattern:
Each layer receives feature-maps
from ALL preceding layers
Each layer passes its
feature-maps to ALL subsequent
layers
Features combined by
concatenation (not summation
like ResNet)
Mathematical Formulation:
Figure: 5-layer dense block with growth
xℓ = Hℓ ([x0 , x1 , ..., xℓ−1 ]) rate k=4
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
4 / 16
DenseNet Architecture - Key Components
Dense Blocks: Groups of densely connected layers
Transition Layers: Between blocks for down-sampling
Batch Normalization + 1×1 Conv + 2×2 Average Pooling
Composite Function Hℓ : BN → ReLU → 3×3 Conv
Growth Rate (k): Number of feature-maps each layer produces
Layer ℓ has k0 + k × (ℓ − 1) input feature-maps
Small growth rates (k=12) are sufficient
Architecture Variants:
DenseNet-B: With bottleneck layers (1×1 conv before 3×3)
DenseNet-C: With compression (θ < 1 compression factor)
DenseNet-BC: Both bottleneck and compression
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
5 / 16
Tasks They Are Solving
Primary Task: Image Classification
Datasets Evaluated:
CIFAR-10: 10 classes, 32×32 colored images
CIFAR-100: 100 classes, 32×32 colored images
SVHN: Street View House Numbers, 32×32 digit images
ImageNet: 1000 classes, large-scale image recognition
Key Challenges Addressed:
Vanishing gradient problem in very deep networks
Parameter efficiency vs. accuracy trade-off
Feature reuse and information flow
Overfitting in smaller datasets
Broader Applications Mentioned:
Feature extraction for various computer vision tasks
Transfer learning scenarios
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
6 / 16
Baseline Methods
Primary Comparison: ResNet and ResNet variants
ResNet-110, ResNet-1001
Pre-activation ResNet-164, ResNet-1001
Wide ResNet-16, ResNet-28
ResNet with Stochastic Depth
Other State-of-the-Art Methods:
Network in Network (NIN)
All-CNN
Deeply Supervised Net (DSN)
Highway Networks
FractalNet (with/without Dropout/Drop-path)
Fair Comparison Strategy:
Used publicly available ResNet implementation
Kept all experimental settings identical
Same data preprocessing and optimization settings
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
7 / 16
Experimental Setup
Training Configuration:
Optimizer: SGD with Nesterov momentum (0.9)
Weight decay: 10−4
CIFAR/SVHN: Batch size 64, 300/40 epochs
ImageNet: Batch size 256, 90 epochs
Learning rate: 0.1 initially, divided by 10 at 50% and 75% of training
Evaluation Metrics:
Classification Error Rate (%)
Top-1 and Top-5 Error (ImageNet)
Parameter Efficiency: Accuracy vs. number of parameters
Computational Efficiency: Accuracy vs. FLOPs
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
8 / 16
Key Results - CIFAR and SVHN
Method Params C10+ C100+ SVHN
ResNet-110 1.7M 6.41 27.22 2.01
ResNet-1001 10.2M 4.62 22.71 -
Wide ResNet-28 36.5M 4.17 20.50 -
FractalNet 38.6M 4.60 23.73 1.87
DenseNet-BC (k=24) 15.3M 3.62 17.60 1.74
DenseNet-BC (k=40) 25.6M 3.46 17.18 -
Key Achievements:
30% error reduction on C100 compared to previous best
Significantly fewer parameters than competing methods
State-of-the-art results across all datasets
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
9 / 16
Understanding Top-1 and Top-5 Error Metrics
What are Top-1 and Top-5 Errors?
Top-1 Error: Percentage of test samples where the highest
confidence prediction is wrong
Top-5 Error: Percentage of test samples where the correct class is
NOT among the top 5 predictions
Lower percentages = Better performance
Example: For an image of a ”cat”
Model predicts: [1st: dog 40%, 2nd: cat 35%, 3rd: wolf 15%, ...]
Top-1: WRONG (predicted dog, not cat) → contributes to Top-1
error
Top-5: CORRECT (cat is in top 5) → does NOT contribute to
Top-5 error
Why Two Metrics?
ImageNet has 1000 classes - many visually similar
Top-5 gives credit for ”reasonable” mistakes
Both metrics standard in computer vision research
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
10 / 16
Key Results - ImageNet
Model Top-1 Error (%) Top-5 Error (%)
DenseNet-121 25.02 / 23.61 7.71 / 6.66
DenseNet-169 23.80 / 22.08 6.85 / 5.92
DenseNet-201 22.58 / 21.46 6.34 / 5.54
DenseNet-264 22.15 / 20.80 6.12 / 5.29
Single-crop / 10-crop testing
Parameter Efficiency:
DenseNet-201 (20M params) ResNet-101 (40M+ params)
DenseNet requiring ResNet-50 computation ResNet-101 performance
3× fewer parameters than ResNet for comparable accuracy
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
11 / 16
Parameter and Computational Efficiency
Figure: ImageNet validation error vs. Figure: ImageNet validation error vs.
parameters FLOPs
Key Observations:
DenseNets achieve similar accuracy with significantly fewer
parameters
Computational efficiency (FLOPs) also favors DenseNets
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
12 / 16
Parameter Efficiency Analysis
Figure: Training curves
Figure: DenseNet variants Figure: DenseNet vs comparison
comparison ResNet efficiency
Key Findings:
DenseNet-BC is most parameter-efficient variant
3× fewer parameters than ResNet for same accuracy
100-layer DenseNet (0.8M params) 1001-layer ResNet (10.2M
params)
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
13 / 16
Future Work and Research Gaps
Authors’ Proposed Future Directions:
Feature Transfer: Study DenseNets as feature extractors for other
computer vision tasks
Hyperparameter Optimization: More extensive hyperparameter
search specifically for DenseNets (current settings optimized for
ResNets)
Memory Efficiency: Further improvements in memory-efficient
implementations
Identified Research Gaps:
Scalability: How do DenseNets perform with even deeper
architectures?
Other Vision Tasks: Object detection, semantic segmentation, etc.
Architectural Variations: Different connectivity patterns within
dense blocks
Theoretical Understanding: Why does dense connectivity work so
well?
Computational Optimization: Hardware-specific optimizations for
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
14 / 16
Conclusion
Key Contributions:
Novel Architecture: Dense connectivity pattern with L(L+1) 2
connections
Parameter Efficiency: Substantially fewer parameters than ResNets
State-of-the-Art Results: Superior performance on multiple
benchmarks
Theoretical Insights: Feature reuse and implicit deep supervision
Impact:
Challenges the assumption that deeper networks need more
parameters
Opens new research directions in network connectivity patterns
Provides a strong baseline for future architectural innovations
Practical Value:
More efficient models for resource-constrained environments
Better feature representations for transfer learning
Stable training for very deep networks
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
15 / 16
Questions?
Thank you for your attention!
Presented by: Joshua Juste NIKIEMA (Original
Densely
Authors:
Connected
Gao Huang,
Convolutional
Zhuang Liu,
Networks
Laurens
(DenseNet)
van der Maaten,
AugustKilian
4, 2025
Q. Weinberger)
16 / 16