Skip to content

jing-bi/automatic-differentiation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cover

Repo for step by step work through the implementation of Automatic Differentiation

Note: This repo was built few years ago when I was trying to figure out how PyTorch can do autograd. It is built with zero dependencies, so you can see how Python syntax can be used to log the forward operation.

What is Autograd?

Autograd means you only need to apply forward operations to the variable, and the framework should log the operations and automatically differentiate for you, giving back the gradient.

Key Components of Automatic Differentiation

Automatic differentiation can be broken down into two main parts:

  1. Tracer: Tracks all operations applied to variables, building a computational graph.
  2. Graph Traversal: Walks through the graph to compute values (forward pass) and gradients (backward pass).

Approach 1: Naive Operation Overloading

A simple way to build a computational graph is by overloading operations for a custom variable class. For example:

class Var:
    def __init__(self, value):
        self.value = value
        self.children = []  # Tracks dependencies in the graph
        self.grad_value = None  # Stores the gradient

Each time you perform an operation, a new Var is created, extending the graph. However, this approach requires you to manually overload every operation you want to support, which can be tedious and error-prone.


Approach 2: Automatic Graph Recording and Backward Pass (VJP)

This approach builds the computational graph automatically as you perform operations, and computes gradients using the vector-Jacobian product (VJP) during the backward pass.

Graph-Unit

  • Node: Represents a node in the computational graph. Each node keeps references to its parent nodes, forming the graph structure.
  • Container: A value-type container used as the atomic unit in forward and backward passes. This design allows you to easily customize your own data containers and related functions.

Inference (Forward Pass)

  1. The forward pass traverses the graph, recording each function invocation.
  2. A function wrapper is used to:
    • Unbox the container to get raw values
    • Compute the result using the original function
    • Box the result into a new container
  3. This process builds the computational graph transparently as you compute.

Backpropagation (Backward Pass)

  1. The backward pass uses VJP to compute gradients for each node with respect to its inputs.
  2. VJP rules are defined separately for each operation, allowing flexibility and extensibility.

This structure allows you to see, step by step, how automatic differentiation frameworks like PyTorch work under the hood, but with minimal code and no external dependencies.

About

Repo for step by step work through the implementation of Automatic Differentiation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages