Merged with PR #17965 #11

sfallah · 2025-12-13T13:00:00Z

Merge and Refactoring in alignment with ggml-org#17909

* batch : fix sequence id ownage * cont : reduce allocations

)

* Extended TRI * Fix whitespace * chore: update webui build output * Just use cuBLAS for everything... * Merge both versions * Remove incorrect imports causing failures for CI * Still failing... remove all direct cublas imports and rely on common imports from "common.cuh" * Defines for hipBlas * Aaaand MUSA defines... * I hate this job... * Stupid typo... * Update ggml/src/ggml-cuda/solve_tri.cu Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>

…rg#17765) * webui: add search field to model selector and fixes mobile viewport overflow * webui: simplify model search style and code * refacor: Search Input component & consistent UI for Models Selector search * feat: Use Popover component + improve interactions * fix: Fetching props for only loaded models in ROUTER mode * webui: prevent models selector popover from overflowing viewport Use Floating UI's auto-positioning with 50dvh height limit and proper collision detection instead of forcing top positioning. Fixes overflow on desktop and mobile keyboard issues * webui: keep search field near trigger in models selector Place search at the 'near end' (closest to trigger) by swapping layout with CSS flexbox order based on popover direction. Prevents input from moving during typing as list shrinks * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <[email protected]>

* enable mmf for RDNA3 * disable mmf for some shape * move some mmvf to mmf * more mmfv to mmf * 3 is good in mmvf --------- Co-authored-by: zhang hui <[email protected]>

…g#17949)

Signed-off-by: Adrien Gallouët <[email protected]>

…gml-org#17954) This commit removes the maximum difference check from the compare-logits.py which would stop early if the difference between the logits exceeded a threshold. The motivation for removing this is that it can be useful to be able to get the complete log for debugging/reporting purposes.

…l-org#17958) * arg: add -mm and -mmu as short form of --mmproj and --mmproj-url * correct order * update docs

…7810) * fix: Improve latex protection logic to prevent turning non-latex `\(` into `$` * chore: update webui build output

…gml-org#17946)

…gml-org#17951) * ggml-cpu:fix RISC-V Q4_0 repack select and RVV feature reporting Signed-off-by: Wang Yang <[email protected]> * using the name VLEN instead of CNT * Update ggml/include/ggml-cpu.h --------- Signed-off-by: Wang Yang <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>

ggml-org#17945) * models : fix the attn_factor for mistral3 graphs * cont : rework attn_factor correction logic * cont : make deepseek2 consistent * cont : add TODO * cont : special-case DSv2 * cont : revert Mistral 3 Large changes * cont : fix DS2 to use the original attn_factor * cont : minor comments

…17953) fix error format Update build.yml Remove unnecessary zip files fix update

* clip: move model cgraphs into their own files * more explicit enums * fix linux build * fix naming * missing headers * nits: add comments for contributors

* args: support negated args * update docs * fix typo * add more neg options * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <[email protected]> * rm duplicated arg * fix LLAMA_ARG_NO_HOST * add test --------- Co-authored-by: Sigbjørn Skjæret <[email protected]>

…ci] (ggml-org#17984) * model-conversion : use CONVERTED_MODEL value for converted model [no ci] This commit updates the model verification scripts to use the CONVERTED_MODEL environment variable instead of using the MODEL_PATH (the original model path) as the basis for the converted model file name. The motivation for this that currently if the converted model file name differs from the original model directory/name the verification scripts will look for the wrong .bin files that were generating when running the models. For example, the following steps were not possible: ```console (venv) $ huggingface-cli download google/gemma-3-270m-it --local-dir ggml-org/gemma-3-270m (venv) $ python3 convert_hf_to_gguf.py ggml-org/gemma-3-270m --outfile test-bf16.gguf --outtype bf16 (venv) $ cd examples/model-conversion/ (venv) $ export MODEL_PATH=../../ggml-org/gemma-3-270m (venv) $ export CONVERTED_MODEL=../../test-bf16.gguf (venv) $ make causal-verify-logits ... Data saved to data/llamacpp-test-bf16.bin Data saved to data/llamacpp-test-bf16.txt Error: llama.cpp logits file not found: data/llamacpp-gemma-3-270m.bin Please run scripts/run-converted-model.sh first to generate this file. make: *** [Makefile:62: causal-verify-logits] Error 1 ``` With the changes in this commit, the above steps will now work as expected.

…gml-org#17975)

When the number of cols is large, split each row across multiple workgroups. There are three phases that communicate partial results through temp buffers: (1) compute max partials (2) take max of partials, compute sum(exp(x-max)) partials (3) sum partials, compute scaled result

…rge_#17965 # Conflicts: # src/llama-kv-cache.cpp # tools/mtmd/clip.cpp

sfallah · 2025-12-13T13:06:46Z

@bluebread
Giving heads up, I make a quick (and dirty) merge.
Will continue with it in the afternoon today.

…rg#17764) * Squashed commit of the following: commit b3c6bf4 Author: Abhijit Ramesh <[email protected]> Date: Mon Dec 1 18:29:00 2025 -0800 ggml webgpu: fix xielu parameter passing (sfallah#11) The XIELU operation was incorrectly using static_cast to convert float parameters to uint32_t, which converted numeric values instead of preserving IEEE 754 bit patterns. This caused incorrect values to be interpreted by the GPU shader. * Use reinterpret_cast to preserve float bit patterns when passing through uint32_t params buffer * Update WGSL shader parameter types from u32 to f32 * Re-enable XIELU support (was disabled due to numerical issues) Fixes NMSE test failures for XIELU operation on WebGPU backend. commit 5ca9b5e Author: neha-ha <[email protected]> Date: Tue Nov 18 12:17:00 2025 -0800 Refactored pipelines and workgroup calculations (sfallah#10) * refactored pipelines * refactored workgroup calculation * removed commented out block of prior maps * Clean up ceiling division pattern --------- Co-authored-by: Neha Abbas <[email protected]> Co-authored-by: Reese Levine <[email protected]> Author: James Contini <[email protected]> Date: Wed Oct 29 23:13:06 2025 -0700 formatted embed wgsl and ggml-webgpu.cpp commit e1f6bae Author: James Contini <[email protected]> Date: Wed Oct 29 23:08:37 2025 -0700 implemented REPL_Template support and removed bug in unary operators kernel commit 8c70b8f Author: James Contini <[email protected]> Date: Wed Oct 15 16:14:20 2025 -0700 responded and dealt with PR comments commit f9282c6 Author: James Contini <[email protected]> Date: Sun Oct 12 13:41:41 2025 -0700 removed unnecesarry checking if node->src[1] exists for unary operators commit 4cf28d7 Author: James Contini <[email protected]> Date: Sun Oct 12 13:32:45 2025 -0700 All operators (inlcluding xielu) working commit 74c6add Author: James Contini <[email protected]> Date: Fri Oct 10 13:16:48 2025 -0700 fixed autoconfig commit 3627499 Author: James Contini <[email protected]> Date: Fri Oct 10 13:10:46 2025 -0700 removed vestigial files commit cb08583 Author: James Contini <[email protected]> Date: Fri Oct 10 12:59:32 2025 -0700 abides by editor-config commit 5360e28 Author: James Contini <[email protected]> Date: Fri Oct 10 12:45:57 2025 -0700 rms_norm double declaration bug atoned commit 7b09baa Merge: 8a6ec84 74b8fc1 Author: James Contini <[email protected]> Date: Fri Oct 10 11:50:03 2025 -0700 resolving merge conflicts commit 8a6ec84 Author: James Contini <[email protected]> Date: Wed Oct 8 18:06:47 2025 -0700 unary operators pass ggml tests commit c3ae382 Author: James Contini <[email protected]> Date: Wed Oct 1 16:22:40 2025 -0700 neg passes backend test commit aa1c9b2 Author: James Contini <[email protected]> Date: Tue Sep 30 23:55:27 2025 -0700 neg f16xf32xip builds and runs, havent actually ran a model that uses neg kernel yet though Co-authored-by: James Contini <[email protected]> Co-authored-by: Neha Abbas <[email protected]> Co-authored-by: Abhijit Ramesh <[email protected]> * Remove extra code and format * Add ops documentation (finally) * Update ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: James Contini <[email protected]> Co-authored-by: Neha Abbas <[email protected]> Co-authored-by: Abhijit Ramesh <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

utsumi-fj and others added 30 commits December 11, 2025 17:12

docs: use port 8080 in Docker examples (ggml-org#17903)

e4ae383

batch : fix sequence id ownership (ggml-org#17915)

d9f8f60

* batch : fix sequence id ownage * cont : reduce allocations

ggml-alloc : fix reuse-parent logic for misaligned sizes (ggml-org#17884

c6f6e4f

)

HIP: enable mmf for RDNA3 (ggml-org#17879)

c33a58b

* enable mmf for RDNA3 * disable mmf for some shape * move some mmvf to mmf * more mmfv to mmf * 3 is good in mmvf --------- Co-authored-by: zhang hui <[email protected]>

cmake: link ws2_32 for MinGW/w64devkit builds in cpp-httplib (ggml-or…

2eaa2c6

…g#17949)

common : add minimalist multi-thread progress bar (ggml-org#17602)

b8ee22c

Signed-off-by: Adrien Gallouët <[email protected]>

arg: add -mm and -mmu as short form of --mmproj and --mmproj-url (ggm…

54a0fee

…l-org#17958) * arg: add -mm and -mmu as short form of --mmproj and --mmproj-url * correct order * update docs

webui: Fix parsing non-LaTeX occurrencies of $ or $ (ggml-org#1…

12280ae

…7810) * fix: Improve latex protection logic to prevent turning non-latex `\(` into `$` * chore: update webui build output

mtmd: explicitly forbidden inclusion of private header and libcommon (g…

1715896

…gml-org#17946)

cann : fix ops broken by circular padding guard (ggml-org#17825)

dcb7d17

CUDA: fix overflow in MMA kernel without stream-k (ggml-org#17939)

4822114

docker : include legacy llama-completion binary (ggml-org#17964)

b7f5f46

ci : change the cann version and the container pull method (ggml-org#…

a8c7f33

…17953) fix error format Update build.yml Remove unnecessary zip files fix update

clip: move model cgraphs into their own files (ggml-org#17965)

e39a2ce

* clip: move model cgraphs into their own files * more explicit enums * fix linux build * fix naming * missing headers * nits: add comments for contributors

add llama-completion to completion-bash executables (ggml-org#17976)

2bc94e7

vulkan: Allow non-pow2 n_experts in topk_moe (ggml-org#17872)

07a10c1

common : skip model validation when --completion-bash is requested (g…

8e4d678

…gml-org#17975)

speculative-simple : free batch on exit (ggml-org#17985)

3c6391e

vulkan: support GGML_OP_DIAG (ggml-org#17893)

3229a23

vulkan: support get_rows for i32 (ggml-org#17941)

36255a2

Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr-me…

e0e69fd

…rge_#17965 # Conflicts: # src/llama-kv-cache.cpp # tools/mtmd/clip.cpp

quick and (potential) dirty merge with ggml-org#17909

f95a6fe

sfallah added 2 commits December 13, 2025 17:13

refactoring, one single builder function and static helpers

f7736f2

added deepseek-ocr test to tests.sh

fb3bb6a

sfallah marked this pull request as ready for review December 13, 2025 16:39

sfallah merged commit 1b38ccf into sf/deepseek-ocr Dec 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merged with PR #17965 #11

Merged with PR #17965 #11

Uh oh!

sfallah commented Dec 13, 2025

Uh oh!

sfallah commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants

Merged with PR #17965 #11

Merged with PR #17965 #11

Uh oh!

Conversation

sfallah commented Dec 13, 2025

Uh oh!

sfallah commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants