Transformers learn in-context by gradient descent

Notebooks for easy replication of the results in the paper Transformers learn in-context by gradient descent.

As the naming suggests, the three notebooks can be used to reproduce the results for the

specific token construction where we concatenate in- and outputs.
usual token construction where we provide in- and outputs in neighbouring tokens.
experiments on non-linear regression tasks.

You can also use the following links to run the notebooks in Google colab.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
README.md		README.md
constructed_token_setup.ipynb		constructed_token_setup.ipynb
non_linear_regression.ipynb		non_linear_regression.ipynb
normal_token_construct.ipynb		normal_token_construct.ipynb

Provide feedback