Skip to content

Conversation

@slaren
Copy link
Member

@slaren slaren commented Sep 3, 2023

ggml-opencl currently stores the GPU buffer in ggml_tensor::data, and after the GGUF changes this will result in a memory leak when not using mmap, as the address of the CPU buffer is lost after the call to ggml_cl_transform_tensor:
https://github.com/ggerganov/llama.cpp/blob/47068e517004d90f13c16352bb3b4cafd53a00cd/llama.cpp#L1516-L1523

This change solves that issue and a possible interaction with ggml-alloc if the opencl buffer falls within the measure buffer memory range by storing the GPU buffer in ggml_tensor::extra instead.

Fixes #2993

@slaren slaren merged commit bd33e5a into master Sep 4, 2023
@slaren slaren deleted the opencl-extra branch September 4, 2023 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

llama.cpp/ggml-alloc.c:230: alloc->n_free_blocks < MAX_FREE_BLOCKS && "out of free blocks"

3 participants