-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
Motivation
Making AutogradMeta optional for a Variable that does not need gradient computation (i.e. it doesn’t require grad, and it doesn’t have a grad_fn) provides the following benefits:
-
Memory savings for Variable that doesn't need gradient computation.
-
Removal of the
Variableclass and themake_variableAPI, and make Variable and Tensor the same concept.
Plan
- Part 1: Make all Variable APIs work for non-AutogradMeta Variables
- There are a list of Variable APIs that always assume the Variable contains AutogradMeta:
Variable::grad_fn()
Variable::grad_fn_unsafe()
Variable::set_grad_accumulator()
Variable::try_get_grad_accumulator()
Variable::grad_accumulator()
Variable::set_gradient_edge()
Variable::output_nr()
Variable::is_leaf()
Variable::add_hook()
Variable::hooks()
Variable::clear_hooks()
Variable::is_view()
Variable::base()
Variable::set_name()
Variable::name()
Variable::backward()
Variable::set_data()
Variable::rebase_history()
TensorImpl::set_requires_grad()
TensorImpl::requires_grad()
TensorImpl::grad()
These functions internally calls get_autograd_meta() , and if the Variable doesn’t have AutogradMeta, it will throw a nasty segfault.
The right behavior is to check whether AutogradMeta exists, and if AutogradMeta doesn’t exist for that Variable:
-
For setter methods:
- Create AutogradMeta on the fly for that Variable, and then proceed as usual.
-
For getter methods:
- Return a sensible null value.
- Caveat: some functions are expected to return a mutable/const reference, and returning a mutable/const reference to NULL might not be a good idea / might not work. One idea is that we have an API that ask “whether we can do something” (e.g. has_hooks())before we “do something” (e.g. hooks()), but we need to check whether this design can work for all cases.
-
Part 2: Don’t create AutogradMeta in make_variable(...) when not required
-
Don’t create AutogradMeta in make_variable(...) when requires_grad=false and gradient_edge is undefined
-
Make TensorImpl.is_variable() only check the at::NonVariableTypeMode guard, because now a Variable that doesn’t have AutogradMeta is still a Variable
-
Maintain the invariant: a Tensor should only have AutogradMeta if it requires grad or has grad_fn
-
Part 3: Deprecate TensorOptions.is_variable()
-
Deprecate TensorOptions.is_variable() (have it always return true, and throw warning when the user tries to set this field)
-
For getType(TensorOptions), we only check at::NonVariableTypeMode::is_enabled() to decide whether to choose Variable path.
-
Part 4: Replace Variable wrapping functions
-
Replace make_variable(...) with appropriate API that attaches AutogradMeta when needed. Audit all call sites of make_variable(...) to understand their expected behavior regarding whether we need do shallow-copy, or if we can just attach AutogradMeta to the original Variable.
-
Replace as_variable(...) with appropriate API that attaches AutogradMeta when needed.
-
Part 5: Documentation improvement
-
Improve "NOTE: After the Variable/Tensor merge" comment based on Move version_counter_ to TensorImpl #18223 (comment) (Move version_counter_ to TensorImpl #18223 (comment))
-
Improve “Note [Tensor versus Variable in C++]” commend based on https://github.com/pytorch/pytorch/pull/17072/files#r276326234
-
Part 6: Remove Variable class
-
Move autograd-specific functions from Variable class to free functions in
torch::autograd::
Function* grad_fn_unsafe() const;
void set_grad_accumulator(std::weak_ptr<Function> grad_accumulator);
std::shared_ptr<Function> try_get_grad_accumulator() const;
std::shared_ptr<Function> grad_accumulator() const;
Edge gradient_edge() const;
void set_gradient_edge(Edge edge) noexcept;
void bump_version() noexcept;
void set_version_counter(const c10::VariableVersion& version_counter) noexcept;
const c10::VariableVersion& version_counter() const noexcept;
uint32_t current_version() const noexcept; // Replaced by _version() in Tensor
void rebase_history(Edge gradient_edge);
void add_hook(std::shared_ptr<FunctionPreHook> hook);
const std::vector<std::shared_ptr<FunctionPreHook>>& hooks() const noexcept;
void clear_hooks();
bool is_view() const noexcept;
const Variable& base() const;
void set_name(const std::string& name);
const std::string& name() const noexcept;
PyObject* pyobj() const noexcept;
void set_pyobj(PyObject* pyobj) noexcept;-
Remove Variable and use at::Tensor everywhere.
-
Part 7: Add compute_requires_grad() to ATen core
-
There are various places in the codebase where we need to check tensor.requires_grad() and GradMode::is_enabled() at the same time. Ideally we should use compute_requires_grad() to simplify the check.
-
Clean up mentions of "Variable and Tensor are merged“.