0% found this document useful (0 votes)
37 views2 pages

Neural Optimization

This paper presents a novel adaptive gradient clipping technique that enhances the stability and convergence speed of deep neural networks by dynamically adjusting clipping thresholds based on gradient distributions. The proposed method, tested on CIFAR-10 and ImageNet datasets, shows a 15% improvement in training efficiency and achieves better accuracy compared to traditional fixed-threshold approaches. Future research will investigate its application to transformer-based models.

Uploaded by

Alan jac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views2 pages

Neural Optimization

This paper presents a novel adaptive gradient clipping technique that enhances the stability and convergence speed of deep neural networks by dynamically adjusting clipping thresholds based on gradient distributions. The proposed method, tested on CIFAR-10 and ImageNet datasets, shows a 15% improvement in training efficiency and achieves better accuracy compared to traditional fixed-threshold approaches. Future research will investigate its application to transformer-based models.

Uploaded by

Alan jac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Advancements in Neural Network Optimization Using Adaptive

Gradient Clipping
Dr. Alice Quantum
University of Fictional Studies
alice.quantum@fictional.edu

June 2025

Abstract
This paper introduces a novel adaptive gradient clipping technique to enhance the stability and
convergence speed of deep neural networks. By dynamically adjusting clipping thresholds based
on gradient distributions, our method outperforms traditional fixed-threshold approaches on bench-
mark datasets like CIFAR-10 and ImageNet. We present empirical results demonstrating a 15%
improvement in training efficiency.

1 Introduction
Deep neural networks have revolutionized fields such as computer vision and natural language process-
ing. However, training instability due to exploding gradients remains a challenge. This paper proposes
an adaptive gradient clipping algorithm that leverages statistical properties of gradient distributions to
stabilize training without sacrificing model performance.

2 Methodology
Our approach, termed Adaptive Gradient Clipping (AGC), adjusts the clipping threshold dynamically
based on the gradient’s standard deviation over a sliding window. The algorithm is defined as follows:

Algorithm 1 Adaptive Gradient Clipping


Initialize: θ0 , window size W , initial threshold τ0
for each epoch do
Compute gradients gt for parameters θt
Update running mean µg and variance σg2 over window W
Set τt = µg + k · σg , where k is a hyperparameter
Clip gradients: gt′ = min(max(gt , −τt ), τt )
Update parameters: θt+1 = θt − ηgt′
end for

3 Results
We evaluated AGC on CIFAR-10 and ImageNet datasets using ResNet-50. AGC achieved a top-1 ac-
curacy of 78.3% on CIFAR-10 (vs. 76.1% for fixed clipping) and reduced training time by 15% on
ImageNet. Convergence was consistently faster across learning rates.

1
4 Conclusion
Our adaptive gradient clipping method offers a robust solution for training deep neural networks, im-
proving both stability and efficiency. Future work will explore its application to transformer-based
models.

References
[1] Goodfellow, I., et al. (2016). Deep Learning. MIT Press.

[2] He, K., et al. (2016). Deep Residual Learning for Image Recognition. CVPR.

You might also like