Pinned
As promised. Our first paper and contribution to the amazing work going on to make open source models smaller, faster, and more accessible.
So what is it, and why is it important?
We discovered what appears to be a universal formula that identifies dead attention heads in any






