-
Notifications
You must be signed in to change notification settings - Fork 0
CUDA & CPU: support incontiguous inputs for get_rel_pos #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: op-dsocr-clean
Are you sure you want to change the base?
CUDA & CPU: support incontiguous inputs for get_rel_pos #1
Conversation
…_pos * Removed the overridden max_nmse_err method in the test_get_rel_pos struct as it was deemed unnecessary. * Added multiple new test cases for various attention configurations, enhancing coverage for the get_rel_pos functionality.
…ML_OP_GET_REL_POS
|
@AgainstEntropy Please take a look at the error message for the failures and feel free to tell me what I can help you. |
Seems the error was raised from this line: llama.cpp/ggml/src/ggml-cuda/win.cu Line 141 in 4813779
I think we should be able to fix it by replacing the current |
cbd59d4 to
840f42d
Compare
Hi @bluebread ,I’ve fixed the compilation error in the HIP backend. The CI workflow currently uses ROCm 6.1.2, which doesn’t yet support constructing bfloat16 values from float or double. With the new change it builds successfully on my machine and passes I also noticed that you’re using existing ggml ops to construct win_part, win_unpart, and get_rel_pos. |
@AgainstEntropy To be honest, I don't know if we should continue with that PR or not. I'm still discussing what is the best way to implement the window attention with the maintainer. It is actually inefficient to construct it through existing ggml ops despite the convenience. So I guess there is still a chance for ggml_win_part & ggml_win_unpart being accepted. I really appreciate your help on the PR but it really needs some time for us to make decisions. Sorry. |
I’m always happy to contribute, so no need to apologize at all🤣! I agree with the inefficiency, and the same for |

No description provided.