CUDA & CPU: support incontiguous inputs for get_rel_pos #1

AgainstEntropy · 2025-12-03T19:23:47Z

No description provided.

…_pos * Removed the overridden max_nmse_err method in the test_get_rel_pos struct as it was deemed unnecessary. * Added multiple new test cases for various attention configurations, enhancing coverage for the get_rel_pos functionality.

…ML_OP_GET_REL_POS

AgainstEntropy · 2025-12-03T19:26:15Z

Test results of ./build-cuda/bin/test-backend-ops -o GET_REL_POS:

bluebread · 2025-12-04T03:36:41Z

@AgainstEntropy Please take a look at the error message for the failures and feel free to tell me what I can help you.

AgainstEntropy · 2025-12-04T06:02:34Z

@AgainstEntropy Please take a look at the error message for the failures and feel free to tell me what I can help you.

Seems the error was raised from this line:

llama.cpp/ggml/src/ggml-cuda/win.cu

Line 141 in 4813779

*((T *) dp) = 0;

I think we should be able to fix it by replacing the current *((T *) dp) = 0; with *((T *) dp) = 0.0;

AgainstEntropy · 2025-12-14T23:29:53Z

@AgainstEntropy Please take a look at the error message for the failures and feel free to tell me what I can help you.

Seems the error was raised from this line:

llama.cpp/ggml/src/ggml-cuda/win.cu

Line 141 in 4813779

*((T *) dp) = 0;

I think we should be able to fix it by replacing the current *((T *) dp) = 0; with *((T *) dp) = 0.0;

Hi @bluebread ,I’ve fixed the compilation error in the HIP backend. The CI workflow currently uses ROCm 6.1.2, which doesn’t yet support constructing bfloat16 values from float or double. With the new change it builds successfully on my machine and passes test-backend-ops -o WIN_PART

I also noticed that you’re using existing ggml ops to construct win_part, win_unpart, and get_rel_pos.
Could you let me know if you still plan to continue with ggml-org#17383 ? If there’s anything I can help with to move it forward, I’d be happy to assist.

bluebread · 2025-12-15T04:13:54Z

@AgainstEntropy Please take a look at the error message for the failures and feel free to tell me what I can help you.

Seems the error was raised from this line:

llama.cpp/ggml/src/ggml-cuda/win.cu

Line 141 in 4813779

*((T *) dp) = 0;

I think we should be able to fix it by replacing the current *((T *) dp) = 0; with *((T *) dp) = 0.0;

Hi @bluebread ,I’ve fixed the compilation error in the HIP backend. The CI workflow currently uses ROCm 6.1.2, which doesn’t yet support constructing bfloat16 values from float or double. With the new change it builds successfully on my machine and passes test-backend-ops -o WIN_PART

I also noticed that you’re using existing ggml ops to construct win_part, win_unpart, and get_rel_pos.

Could you let me know if you still plan to continue with ggml-org#17383 ? If there’s anything I can help with to move it forward, I’d be happy to assist.

@AgainstEntropy To be honest, I don't know if we should continue with that PR or not. I'm still discussing what is the best way to implement the window attention with the maintainer. It is actually inefficient to construct it through existing ggml ops despite the convenience. So I guess there is still a chance for ggml_win_part & ggml_win_unpart being accepted. I really appreciate your help on the PR but it really needs some time for us to make decisions. Sorry.

AgainstEntropy · 2025-12-16T18:06:07Z

@AgainstEntropy To be honest, I don't know if we should continue with that PR or not. I'm still discussing what is the best way to implement the window attention with the maintainer. It is actually inefficient to construct it through existing ggml ops despite the convenience. So I guess there is still a chance for ggml_win_part & ggml_win_unpart being accepted. I really appreciate your help on the PR but it really needs some time for us to make decisions. Sorry.

I’m always happy to contribute, so no need to apologize at all🤣! I agree with the inefficiency, and the same for ggml_get_rel_pos.
Just lmk if anything I can help with as you work out the best solution.

AgainstEntropy added 4 commits December 3, 2025 18:42

cpu: support incontiguous inputs for get_rel_pos

d1b1757

cuda: support incontiguous inputs for get_rel_pos

fea493a

cuda: add missing break statement in ggml_cuda_compute_forward for GG…

cc8c254

…ML_OP_GET_REL_POS

github-actions bot added Nvidia GPU testing ggml labels Dec 3, 2025

AgainstEntropy mentioned this pull request Dec 3, 2025

ggml : enhance rel-pos and window ops with CUDA support ggml-org/llama.cpp#17383

Open

fix: use ggml_cuda_cast for conversion to bf16

840f42d

AgainstEntropy force-pushed the rel-pos-rebased branch from cbd59d4 to 840f42d Compare December 8, 2025 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA & CPU: support incontiguous inputs for get_rel_pos #1

CUDA & CPU: support incontiguous inputs for get_rel_pos #1

Uh oh!

AgainstEntropy commented Dec 3, 2025

Uh oh!

AgainstEntropy commented Dec 3, 2025

Uh oh!

bluebread commented Dec 4, 2025

Uh oh!

AgainstEntropy commented Dec 4, 2025

Uh oh!

AgainstEntropy commented Dec 14, 2025

Uh oh!

bluebread commented Dec 15, 2025 •

edited

Loading

Uh oh!

AgainstEntropy commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA & CPU: support incontiguous inputs for get_rel_pos #1

Are you sure you want to change the base?

CUDA & CPU: support incontiguous inputs for get_rel_pos #1

Uh oh!

Conversation

AgainstEntropy commented Dec 3, 2025

Uh oh!

AgainstEntropy commented Dec 3, 2025

Uh oh!

bluebread commented Dec 4, 2025

Uh oh!

AgainstEntropy commented Dec 4, 2025

Uh oh!

AgainstEntropy commented Dec 14, 2025

Uh oh!

bluebread commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AgainstEntropy commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bluebread commented Dec 15, 2025 •

edited

Loading