Redesign InPlaceAccumulator op#11842
Conversation
|
This pull request introduces 1 alert when merging 184150c into fb88efb - view on LGTM.com new alerts:
|
| memcpy(accumulation_buffer_data, updated_data, new_value->SizeInBytes()); | ||
| } else { | ||
| // Copy from Add CPU kernel | ||
| ProcessBroadcastSpanFuncs funcs{ |
There was a problem hiding this comment.
curious if it is possible we reuse same ProcessBroadcastSpanFuncs instance for both v1 and v2 kernels.
There was a problem hiding this comment.
Maybe but for that I can think of one way by holding a static ProcessBroadcastSpanFuncs which will be used by both, but not sure if it is worth it. let me know if any other better way
orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.cc
Outdated
Show resolved
Hide resolved
| @@ -132,9 +132,9 @@ Status Module::TrainStep(const std::vector<OrtValue>& inputs, std::vector<OrtVal | |||
| feeds.insert(feeds.end(), weights_.begin(), weights_.end()); | |||
| feeds.insert(feeds.end(), gradients_.begin(), gradients_.end()); | |||
| // TODO: consider maintaining this as ortvalue instead of bool | |||
There was a problem hiding this comment.
if it is easy to implement the TODO, shall we do it in this PR?
There was a problem hiding this comment.
the question is if we should do it, is it better to hold an ortvalue and update underlying buffer by unwrapping it every step, or to wrap it every step
There was a problem hiding this comment.
OK, I did not realize this. How about hold two ORTValues directly., one is True, one is False. Never mind, let's refine it later.
|
|
||
| test.Run(OpTester::ExpectResult::kExpectSuccess, "", {kCpuExecutionProvider}); | ||
| } | ||
|
|
There was a problem hiding this comment.
shall we also add >1D input data cases?
184150c to
5a23f40
Compare
|
This pull request introduces 1 alert when merging 5a23f40 into f63e28c - view on LGTM.com new alerts:
|
orttraining/orttraining/training_ops/cuda/optimizer/gradient_control.cc
Outdated
Show resolved
Hide resolved
| std::vector<std::vector<int64_t>> x_shapes = { | ||
| {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, {4, 3, 2}, | ||
| {4, 3, 2}, | ||
| {4, 3, 2}, |
There was a problem hiding this comment.
nit: we'd better avoid making this kind of change to make it easier for us when merge back to master branch.
There was a problem hiding this comment.
we'll be cleaning up anyway at merge time, let's handle it then? will make sure to not make such changes further
|
We also need to update the |
This is already done for GetTrainModeOutputCount. The Eval model should only have user outputs anyway so no change required |
f9358ba to
594a0e4
Compare
This PR makes the output buffer of InPlaceAccumulatorV2 op optional, and introduces an additional optional input 'overwrite'.
Also added op tests and onnxblock changes to use the new op