Skip to content

Proposal: Optional AutogradMeta for Variable #23032

@yf225

Description

@yf225

Motivation

Making AutogradMeta optional for a Variable that does not need gradient computation (i.e. it doesn’t require grad, and it doesn’t have a grad_fn) provides the following benefits:

  1. Memory savings for Variable that doesn't need gradient computation.

  2. Removal of the Variable class and the make_variable API, and make Variable and Tensor the same concept.

Plan

  • Part 1: Make all Variable APIs work for non-AutogradMeta Variables
  • There are a list of Variable APIs that always assume the Variable contains AutogradMeta:
Variable::grad_fn()
Variable::grad_fn_unsafe()
Variable::set_grad_accumulator()
Variable::try_get_grad_accumulator()
Variable::grad_accumulator()
Variable::set_gradient_edge()
Variable::output_nr()
Variable::is_leaf()
Variable::add_hook()
Variable::hooks()
Variable::clear_hooks()
Variable::is_view()
Variable::base()
Variable::set_name()
Variable::name()
Variable::backward()
Variable::set_data()
Variable::rebase_history()

TensorImpl::set_requires_grad()
TensorImpl::requires_grad()
TensorImpl::grad()

These functions internally calls get_autograd_meta() , and if the Variable doesn’t have AutogradMeta, it will throw a nasty segfault.
The right behavior is to check whether AutogradMeta exists, and if AutogradMeta doesn’t exist for that Variable:

  • For setter methods:

    • Create AutogradMeta on the fly for that Variable, and then proceed as usual.
  • For getter methods:

    • Return a sensible null value.
    • Caveat: some functions are expected to return a mutable/const reference, and returning a mutable/const reference to NULL might not be a good idea / might not work. One idea is that we have an API that ask “whether we can do something” (e.g. has_hooks())before we “do something” (e.g. hooks()), but we need to check whether this design can work for all cases.
  • Part 2: Don’t create AutogradMeta in make_variable(...) when not required

  • Don’t create AutogradMeta in make_variable(...) when requires_grad=false and gradient_edge is undefined

  • Make TensorImpl.is_variable() only check the at::NonVariableTypeMode guard, because now a Variable that doesn’t have AutogradMeta is still a Variable

  • Maintain the invariant: a Tensor should only have AutogradMeta if it requires grad or has grad_fn

  • Part 3: Deprecate TensorOptions.is_variable()

  • Deprecate TensorOptions.is_variable() (have it always return true, and throw warning when the user tries to set this field)

  • For getType(TensorOptions), we only check at::NonVariableTypeMode::is_enabled() to decide whether to choose Variable path.

  • Part 4: Replace Variable wrapping functions

  • Replace make_variable(...) with appropriate API that attaches AutogradMeta when needed. Audit all call sites of make_variable(...) to understand their expected behavior regarding whether we need do shallow-copy, or if we can just attach AutogradMeta to the original Variable.

  • Replace as_variable(...) with appropriate API that attaches AutogradMeta when needed.

  • Part 5: Documentation improvement

  • Improve "NOTE: After the Variable/Tensor merge" comment based on Move version_counter_ to TensorImpl #18223 (comment) (Move version_counter_ to TensorImpl #18223 (comment))

  • Improve “Note [Tensor versus Variable in C++]” commend based on https://github.com/pytorch/pytorch/pull/17072/files#r276326234

  • Part 6: Remove Variable class

  • Move autograd-specific functions from Variable class to free functions in torch::autograd::

    Function* grad_fn_unsafe() const;
    void set_grad_accumulator(std::weak_ptr<Function> grad_accumulator);
    std::shared_ptr<Function> try_get_grad_accumulator() const;
    std::shared_ptr<Function> grad_accumulator() const;
    Edge gradient_edge() const;
    void set_gradient_edge(Edge edge) noexcept;
    void bump_version() noexcept;
    void set_version_counter(const c10::VariableVersion& version_counter) noexcept;
    const c10::VariableVersion& version_counter() const noexcept;
    uint32_t current_version() const noexcept;  // Replaced by _version() in Tensor
    void rebase_history(Edge gradient_edge);
    void add_hook(std::shared_ptr<FunctionPreHook> hook);
    const std::vector<std::shared_ptr<FunctionPreHook>>& hooks() const noexcept;
    void clear_hooks();
    bool is_view() const noexcept;
    const Variable& base() const;
    void set_name(const std::string& name);
    const std::string& name() const noexcept;
    PyObject* pyobj() const noexcept;
    void set_pyobj(PyObject* pyobj) noexcept;
  • Remove Variable and use at::Tensor everywhere.

  • Part 7: Add compute_requires_grad() to ATen core

  • There are various places in the codebase where we need to check tensor.requires_grad() and GradMode::is_enabled() at the same time. Ideally we should use compute_requires_grad() to simplify the check.

  • Clean up mentions of "Variable and Tensor are merged“.

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: autogradRelated to torch.autograd, and the autograd engine in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions