-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Description
While playing with implementing compression for copy/save state, I found a bug, which turned out to be reproducible in current main (41aee4d)
It seems to be model independent, and no parameters other than -ngl seem to make a difference either.
The first symptom happens for save-load-state, main and server, when -ngl equal to exactly N-1 is specified, basically this happens (generated output):
Hello there!###############################
Second symptom was found by accident, when fiddling with save-load-state for the purpose of implementing compression. Basically, if -ngl is N or bigger (all layers loaded),
The problem above, seems to disappear, however:
Not only save-load-state fails because generated text is different for both runs,
but also, after some tokens were sampled llama_copy_state_data outputs mostly empty array, which I only noticed because I tried to dump the state post generation, and suddenly started to get 99% compression ratio on that array. Because it turned out to be mostly zeroes.
All -ngl values between 0 - (N-2) work properly.
I have no way of testing on AMD so I do not know if it's Nvidia specific.
As a sanity check, here are results for -ngl from 0 to N with the same model and parameters (except -ngl):
Edit: Interestingly enough, perplexity looks fine ?
-ngl N-2 (27/29)
[1]5.2069,[2]5.1932,[3]5.1802,[4]5.2837,[5]5.2742,[6]5.0776,
Final estimate: PPL = 5.0776 +/- 0.25768
-ngl N-1 (28/29)
[1]5.2069,[2]5.1932,[3]5.1802,[4]5.2837,[5]5.2742,[6]5.0776,
Final estimate: PPL = 5.0776 +/- 0.25768
-ngl N (29/29)
[1]5.2077,[2]5.1813,[3]5.1687,[4]5.2820,[5]5.2682,[6]5.0756,
Final estimate: PPL = 5.0756 +/- 0.25766