Releases · ggml-org/llama.cpp

model : refactor bias tensor variable names (#22079)

refactor bias tensor variable names
use create_tensor_qkv for jina-bert-v2

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

android : libcommon -> libllama-common (#22076)

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

ggml-backend-meta: add multi-segment read support in get_tensor (#22063)

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

ci : free disk space for rocm release (#22012)

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

ggml-webgpu: fix compiler warnings and refactor FlashAttention encoding (#21052)

Update workflows to remove dependence on llvmpipe
Try setting Dawn_DIR
remove c++20 initializers
Move to proper guid
Try avoiding segfaults on vulkan backend process exit
Remove compiler warnings on parameter casting
Fix soft_max and update reg_tile accumulation to f32 for better precision
Refactor flash_attn a bit
remove c++20 initializers and format
Increase div precision for NVIDIA
revert div precision and comment out ggml-ci node for now
Formatting
Try debugging on a failing CI node
Revert "Try debugging on a failing CI node"

This reverts commit 1971e33.

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

CUDA: use LRU based eviction for cuda graphs (#21611)

CUDA: use a ring-buffer for cuda graphs
bump limit to 128
use LRU eviction
better naming
do periodic clean-up

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

ci : add android arm64 build and release (#21647)

server: respect the ignore eos flag
ci: add android arm64 build and release
patch
pin android-setup actions to v4
Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret [email protected]

lf in the suggestion

Co-authored-by: Sigbjørn Skjæret [email protected]

macOS/iOS:

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

mtmd: add missing struct tag (#22023)

macOS/iOS:

Linux:

Windows:

openEuler:

libs : rename libcommon -> libllama-common (#21936)

cmake : allow libcommon to be shared
cmake : rename libcommon to libllama-common
cont : set -fPIC for httplib
cont : export all symbols
cont : fix build_info exports
libs : add libllama-common-base
log : add common_log_get_verbosity_thold()

macOS/iOS:

Linux:

Windows:

openEuler:

model : Gemma4 model type detection (#22027)

model : Gemma4 model type detection
model : Gemma4 model type detection

macOS/iOS:

Linux:

Windows:

openEuler:

Releases: ggml-org/llama.cpp

b8839

Uh oh!

b8838

Uh oh!

b8837

Uh oh!

b8836

Uh oh!

b8833

Uh oh!

b8832

Uh oh!

b8831

Uh oh!

b8830

Uh oh!

b8829

Uh oh!

b8828

Uh oh!