alestolfo

Alessandro Stolfo alestolfo

Achievements

microsoft/llm-steer-instruct microsoft/llm-steer-instruct Public

A method for steering llms to better follow instructions

Python 74 11
lm-arithmetic lm-arithmetic Public

Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"

Python 19 2
bpwu1/confidence-regulation-neurons bpwu1/confidence-regulation-neurons Public

Confidence Regulation Neurons in Language Models (NeurIPS 2024)

Python 15 1
causal-math causal-math Public

Code Repository for "A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models".

Python 15 3