Skip to content

Conversation

@MollySophia
Copy link
Collaborator

The token-shifting part was not correctly done in the previous implementation. It wasn't copied back to k_cache after a decode. As a result, the model was always lerping towards zero when decoding. Prefill(and as a result, PPL evaluation) wasn't affected.

Somehow this mistake didn't affect much on text generation as well lol (maybe the large 32B model already got enough context information into the wkv state?). That's why the bug wasn't found previously.

@MollySophia MollySophia merged commit 325afb3 into ggml-org:master Jan 29, 2025
45 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants