CUDA kernel for ClipGradNorm for TensorSeq gradients by baijumeswani · Pull Request #12412 · microsoft/onnxruntime

baijumeswani · 2022-08-01T20:51:45Z

Currently, ClipGradNorm in onnxruntime is executed by running more primitive onnxruntime ops followed by a SequenceConstruct before feeding into AdamWOptimizer.

This SequenceConstruct will create another copy of the input tensors and put them into a TensorSeq. To avoid making these copies, the ClipGradNorm kernel can take in a TensorSeq and clip the gradients in place.

orttraining/orttraining/training_ops/cuda/optimizer/clip_grad_norm/clip_grad_norm.cc

orttraining/orttraining/training_ops/cuda/optimizer/clip_grad_norm/clip_grad_norm.h

orttraining/orttraining/training_ops/cuda/optimizer/clip_grad_norm/clip_grad_norm_impl.cu

…pull request review comments

orttraining/orttraining/core/graph/training_op_defs.cc

orttraining/orttraining/training_ops/cuda/optimizer/clip_grad_norm/clip_grad_norm.cc

orttraining/orttraining/training_ops/cuda/optimizer/clip_grad_norm/clip_grad_norm.h

…iew comments

orttraining/orttraining/core/graph/training_op_defs.cc

baijumeswani added the training issues related to ONNX Runtime training; typically submitted using template label Aug 1, 2022

baijumeswani requested review from ashbhandare, askhade and pengwa August 1, 2022 20:51

baijumeswani added 2 commits August 2, 2022 16:42

Add CUDA kernel for ClipGradNorm for TensorSeq gradients

293a7e5

Add support for TensorSeq for graph outputs in InferenceSession

a1119e1

baijumeswani force-pushed the bmeswani/clipgradnorm branch from 1eb026f to a1119e1 Compare August 2, 2022 16:42

Use inlined containers

770466f

pengwa reviewed Aug 3, 2022

View reviewed changes

Use ClipGradNormInplace to indicate inplace update and resolve other …

6c9cb32

…pull request review comments