Redesign InPlaceAccumulator op by ashbhandare · Pull Request #11842 · microsoft/onnxruntime

ashbhandare · 2022-06-13T23:51:02Z

This PR makes the output buffer of InPlaceAccumulatorV2 op optional, and introduces an additional optional input 'overwrite'.
Also added op tests and onnxblock changes to use the new op

lgtm-com · 2022-06-14T01:26:27Z

This pull request introduces 1 alert when merging 184150c into fb88efb - view on LGTM.com

new alerts:

1 for Commented-out code

orttraining/orttraining/test/gradient/gradient_ops_test.cc

orttraining/orttraining/training_ops/cuda/optimizer/gradient_control.cc

pengwa · 2022-06-15T02:26:24Z

orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.cc

+    memcpy(accumulation_buffer_data, updated_data, new_value->SizeInBytes());
+  } else {
+    // Copy from Add CPU kernel
+    ProcessBroadcastSpanFuncs funcs{


curious if it is possible we reuse same ProcessBroadcastSpanFuncs instance for both v1 and v2 kernels.

Maybe but for that I can think of one way by holding a static ProcessBroadcastSpanFuncs which will be used by both, but not sure if it is worth it. let me know if any other better way

orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.cc

orttraining/orttraining/test/gradient/gradient_ops_test.cc

pengwa · 2022-06-15T02:31:13Z

orttraining/orttraining/training_api/module.cc

@@ -132,9 +132,9 @@ Status Module::TrainStep(const std::vector<OrtValue>& inputs, std::vector<OrtVal
  feeds.insert(feeds.end(), weights_.begin(), weights_.end());
  feeds.insert(feeds.end(), gradients_.begin(), gradients_.end());
  // TODO: consider maintaining this as ortvalue instead of bool


if it is easy to implement the TODO, shall we do it in this PR?

the question is if we should do it, is it better to hold an ortvalue and update underlying buffer by unwrapping it every step, or to wrap it every step

OK, I did not realize this. How about hold two ORTValues directly., one is True, one is False. Never mind, let's refine it later.

orttraining/orttraining/test/gradient/gradient_ops_test.cc

pengwa · 2022-06-15T02:36:03Z

orttraining/orttraining/test/gradient/gradient_ops_test.cc

+
+  test.Run(OpTester::ExpectResult::kExpectSuccess, "", {kCpuExecutionProvider});
+}
+


shall we also add >1D input data cases?

orttraining/orttraining/core/graph/training_op_defs.cc

orttraining/orttraining/training_ops/cuda/cuda_training_kernels.cc

lgtm-com · 2022-06-17T01:33:32Z

This pull request introduces 1 alert when merging 5a23f40 into f63e28c - view on LGTM.com

new alerts:

1 for Commented-out code

orttraining/orttraining/core/graph/training_op_defs.cc

orttraining/orttraining/training_ops/cuda/optimizer/gradient_control.cc

orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.cc

orttraining/orttraining/training_api/module.cc

pengwa · 2022-06-23T03:28:10Z

orttraining/orttraining/test/gradient/gradient_ops_test.cc

  std::vector<std::vector<int64_t>> x_shapes = {
-      {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2},
+      {4, 3, 2},
+      {4, 3, 2},


nit: we'd better avoid making this kind of change to make it easier for us when merge back to master branch.

we'll be cleaning up anyway at merge time, let's handle it then? will make sure to not make such changes further

orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.cc

orttraining/orttraining/python/training/onnxblock/_graph_utils.py

orttraining/orttraining/training_api/module.cc

baijumeswani · 2022-06-23T17:16:06Z

We also need to update the GetEvalModeOutputCount and GetTrainModeOutputCount to not include the bool tensors as outputs, right?

ashbhandare · 2022-06-23T17:23:57Z

We also need to update the GetEvalModeOutputCount and GetTrainModeOutputCount to not include the bool tensors as outputs, right?

This is already done for GetTrainModeOutputCount. The Eval model should only have user outputs anyway so no change required