Proposal: Optional AutogradMeta for Variable

# Motivation

Making AutogradMeta optional for a Variable that does not need gradient computation (i.e. it doesn’t require grad, and it doesn’t have a grad_fn) provides the following benefits:

1. Memory savings for Variable that doesn't need gradient computation.

2. Removal of the `Variable` class and the `make_variable` API, and make Variable and Tensor the same concept.

# Plan

- [x] **Part 1: Make all Variable APIs work for non-AutogradMeta Variables**

* There are a list of Variable APIs that always assume the Variable contains AutogradMeta:
```
Variable::grad_fn()
Variable::grad_fn_unsafe()
Variable::set_grad_accumulator()
Variable::try_get_grad_accumulator()
Variable::grad_accumulator()
Variable::set_gradient_edge()
Variable::output_nr()
Variable::is_leaf()
Variable::add_hook()
Variable::hooks()
Variable::clear_hooks()
Variable::is_view()
Variable::base()
Variable::set_name()
Variable::name()
Variable::backward()
Variable::set_data()
Variable::rebase_history()

TensorImpl::set_requires_grad()
TensorImpl::requires_grad()
TensorImpl::grad()
```

These functions internally calls get_autograd_meta() , and if the Variable doesn’t have AutogradMeta, it will throw a nasty segfault.
The right behavior is to check whether AutogradMeta exists, and if AutogradMeta doesn’t exist for that Variable:
- For setter methods:
    - Create AutogradMeta on the fly for that Variable, and then proceed as usual.
- For getter methods:
    - Return a sensible null value.
    - Caveat: some functions are expected to return a mutable/const reference, and returning a mutable/const reference to NULL might not be a good idea / might not work. One idea is that we have an API that ask “whether we can do something” (e.g. has_hooks())before we “do something” (e.g. hooks()), but we need to check whether this design can work for all cases.

- [x] **Part 2: Don’t create AutogradMeta in make_variable(...) when not required**

- Don’t create AutogradMeta in make_variable(...) when requires_grad=false and gradient_edge is undefined
- Make TensorImpl.is_variable() only check the at::NonVariableTypeMode guard, because now a Variable that doesn’t have AutogradMeta is still a Variable
- Maintain the invariant: a Tensor should only have AutogradMeta if it requires grad or has grad_fn

- [x] **Part 3: Deprecate TensorOptions.is_variable()**

- Deprecate TensorOptions.is_variable() (have it always return true, and throw warning when the user tries to set this field)
- For getType(TensorOptions), we only check at::NonVariableTypeMode::is_enabled() to decide whether to choose Variable path.

- [ ] **Part 4: Replace Variable wrapping functions**

- Replace make_variable(...) with appropriate API that attaches AutogradMeta when needed. Audit all call sites of make_variable(...) to understand their expected behavior regarding whether we need do shallow-copy, or if we can just attach AutogradMeta to the original Variable.
- Replace as_variable(...) with appropriate API that attaches AutogradMeta when needed.

- [ ] **Part 5: Documentation improvement**

- Improve "NOTE: After the Variable/Tensor merge" comment based on #18223 (comment) (https://github.com/pytorch/pytorch/pull/18223#discussion_r274071728)
- Improve “Note [Tensor versus Variable in C++]” commend based on https://github.com/pytorch/pytorch/pull/17072/files#r276326234

- [x] **Part 6: Remove Variable class**

- Move autograd-specific functions from Variable class to free functions in `torch::autograd::`
```cpp
    Function* grad_fn_unsafe() const;
    void set_grad_accumulator(std::weak_ptr<Function> grad_accumulator);
    std::shared_ptr<Function> try_get_grad_accumulator() const;
    std::shared_ptr<Function> grad_accumulator() const;
    Edge gradient_edge() const;
    void set_gradient_edge(Edge edge) noexcept;
    void bump_version() noexcept;
    void set_version_counter(const c10::VariableVersion& version_counter) noexcept;
    const c10::VariableVersion& version_counter() const noexcept;
    uint32_t current_version() const noexcept;  // Replaced by _version() in Tensor
    void rebase_history(Edge gradient_edge);
    void add_hook(std::shared_ptr<FunctionPreHook> hook);
    const std::vector<std::shared_ptr<FunctionPreHook>>& hooks() const noexcept;
    void clear_hooks();
    bool is_view() const noexcept;
    const Variable& base() const;
    void set_name(const std::string& name);
    const std::string& name() const noexcept;
    PyObject* pyobj() const noexcept;
    void set_pyobj(PyObject* pyobj) noexcept;
```

- Remove Variable and use at::Tensor everywhere.

- [ ] **Part 7: Add compute_requires_grad() to ATen core**

- There are various places in the codebase where we need to check tensor.requires_grad() and GradMode::is_enabled() at the same time. Ideally we should use compute_requires_grad() to simplify the check.
- Clean up mentions of "Variable and Tensor are merged“.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Optional AutogradMeta for Variable #23032

Motivation

Plan

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Optional AutogradMeta for Variable #23032

Description

Motivation

Plan

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions