-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
The problem here is what should be the returned value of a cpp Function when the output was indépendant of the inputs or no gradient was backpropagated for this specific output (it was unused later in the graph).
For python Function, the wrapper takes care, when no gradient exist for an output to create a full dense tensor filled with 0s of the size of the output. The same happens when a python Function returns None for a given parameter, the next Function will actually see a dense tensor of 0s.
In cpp, such wrapper does not exist, so if an output was not used, the corresponding element in the input variable_list will be and empty shared_ptr. And the cpp Function should handle empty shared_ptr as if it was a dense Tensor of 0s (possibly skipping a lot of computations as in ConvBackwardBackward).
Right now, some Function do handle this case properly, but not all of them (leading for example to this issue in the forum).
The question is: are we ok with the above design saying that any input which is an empty shared_ptr is equivalent to a tensor of all 0s ? And all cpp Functions should handle empty shared_ptr as input ?