-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Description
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
After the commit 2002bc9, Mistral-7B-Instruct-v0.2 (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/commit/b70aa86578567ba3301b21c8a27bea4e8f6d6d61) produces longer outputs before the commit.
This merge commit is too long, and contains too many commits before merge.
I applied git bisect in the commits before merge https://github.com/ggerganov/llama.cpp/commits/87a4a105b2fafb291610c1e28f97b8ba07c6f2d7.
(I would like you not to remove each commit before merge ...)
After that, I found the commit bfb121f which triggers the following behavior:
% curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:", "n_predict": 32, "seed": 0, "temperature": 0.0}'
{"content":" Yes\n\nQuestion: What is the smallest common multiple of 12 and 36?\nAnswer: 72\n\nQuestion:","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_keep":0,"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","n_ctx":8192,"n_keep":0,"n_predict":-1,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"],"seed":0,"stop":[],"stream":false,"temperature":0.0,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"id_slot":0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","prompt":"Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:","stop":true,"stopped_eos":false,"stopped_limit":true,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":2719.092,"predicted_n":32,"predicted_per_second":11.768634529467924,"predicted_per_token_ms":84.971625,"prompt_ms":404.269,"prompt_n":24,"prompt_per_second":59.36641196826865,"prompt_per_token_ms":16.844541666666668},"tokens_cached":55,"tokens_evaluated":24,"tokens_predicted":32,"truncated":false}
However, its previous commit aef02b1 did not trigger this mysterious behavior:
% curl --request POST --url http://localhost:8080/completion --header "Content-Type: application/json" --data '{"prompt": "Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:", "n_predict": 32, "seed": 0, "temperature": 0.0}'
{"content":" Yes","generation_settings":{"dynatemp_exponent":1.0,"dynatemp_range":0.0,"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"min_keep":0,"min_p":0.05000000074505806,"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","n_ctx":8192,"n_keep":0,"n_predict":-1,"n_probs":0,"penalize_nl":true,"penalty_prompt_tokens":[],"presence_penalty":0.0,"repeat_last_n":64,"repeat_penalty":1.100000023841858,"samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"],"seed":0,"stop":[],"stream":false,"temperature":0.0,"tfs_z":1.0,"top_k":40,"top_p":0.949999988079071,"typical_p":1.0,"use_penalty_prompt_tokens":false},"id_slot":0,"model":"./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf","prompt":"Question: Is 1 + 1 = 2 correct? Answer yes or no only.\nAnswer:","stop":true,"stopped_eos":true,"stopped_limit":false,"stopped_word":false,"stopping_word":"","timings":{"predicted_ms":88.863,"predicted_n":2,"predicted_per_second":22.5065550341537,"predicted_per_token_ms":44.4315,"prompt_ms":403.348,"prompt_n":24,"prompt_per_second":59.50196852345864,"prompt_per_token_ms":16.806166666666666},"tokens_cached":25,"tokens_evaluated":24,"tokens_predicted":2,"truncated":false}
I launched the server in both cases as follows:
./server -m ./Mistral-7B-Instruct-v0.2/snapshots/b70aa86578567ba3301b21c8a27bea4e8f6d6d61/ggml-model-q8_0.gguf -c 8192
What's the difference?
If the bug concerns the server, please try to reproduce it first using the server test scenario framework.
Yes, this is related to the server.
But before reproducing with it, please tell me if this commit bfb121f is not buggy.