GitHub - jing-bi/automatic-differentiation: Repo for step by step work through the implementation of Automatic Differentiation

Repo for step by step work through the implementation of Automatic Differentiation

Note: This repo was built few years ago when I was trying to figure out how PyTorch can do autograd. It is built with zero dependencies, so you can see how Python syntax can be used to log the forward operation.

What is Autograd?

Autograd means you only need to apply forward operations to the variable, and the framework should log the operations and automatically differentiate for you, giving back the gradient.

Key Components of Automatic Differentiation

Automatic differentiation can be broken down into two main parts:

Tracer: Tracks all operations applied to variables, building a computational graph.
Graph Traversal: Walks through the graph to compute values (forward pass) and gradients (backward pass).

Approach 1: Naive Operation Overloading

A simple way to build a computational graph is by overloading operations for a custom variable class. For example:

class Var:
    def __init__(self, value):
        self.value = value
        self.children = []  # Tracks dependencies in the graph
        self.grad_value = None  # Stores the gradient

Each time you perform an operation, a new Var is created, extending the graph. However, this approach requires you to manually overload every operation you want to support, which can be tedious and error-prone.

Approach 2: Automatic Graph Recording and Backward Pass (VJP)

This approach builds the computational graph automatically as you perform operations, and computes gradients using the vector-Jacobian product (VJP) during the backward pass.

Graph-Unit

Node: Represents a node in the computational graph. Each node keeps references to its parent nodes, forming the graph structure.
Container: A value-type container used as the atomic unit in forward and backward passes. This design allows you to easily customize your own data containers and related functions.

Inference (Forward Pass)

The forward pass traverses the graph, recording each function invocation.
A function wrapper is used to:
- Unbox the container to get raw values
- Compute the result using the original function
- Box the result into a new container
This process builds the computational graph transparently as you compute.

Backpropagation (Backward Pass)

The backward pass uses VJP to compute gradients for each node with respect to its inputs.
VJP rules are defined separately for each operation, allowing flexibility and extensibility.

This structure allows you to see, step by step, how automatic differentiation frameworks like PyTorch work under the hood, but with minimal code and no external dependencies.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
autograd		autograd
.gitignore		.gitignore
README.md		README.md
operation overloading.py		operation overloading.py
tanh.py		tanh.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is Autograd?

Key Components of Automatic Differentiation

Approach 1: Naive Operation Overloading

Approach 2: Automatic Graph Recording and Backward Pass (VJP)

Graph-Unit

Inference (Forward Pass)

Backpropagation (Backward Pass)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

jing-bi/automatic-differentiation

Folders and files

Latest commit

History

Repository files navigation

What is Autograd?

Key Components of Automatic Differentiation

Approach 1: Naive Operation Overloading

Approach 2: Automatic Graph Recording and Backward Pass (VJP)

Graph-Unit

Inference (Forward Pass)

Backpropagation (Backward Pass)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages