-
Notifications
You must be signed in to change notification settings - Fork 913
Comparing changes
Open a pull request
base repository: jundot/omlx
base: v0.3.1
head repository: jundot/omlx
compare: v0.3.2
- 10 commits
- 26 files changed
- 1 contributor
Commits on Apr 2, 2026
-
Configuration menu - View commit details
-
Copy full SHA for 12e668a - Browse repository at this point
Copy the full SHA 12e668aView commit details
Commits on Apr 3, 2026
-
Configuration menu - View commit details
-
Copy full SHA for 4cd0e2e - Browse repository at this point
Copy the full SHA 4cd0e2eView commit details -
refactor: clean up oQ codebase for upcoming enhanced quantization red…
…esign remove legacy GPTQ/clip optimization code and quantize_oq() full-model path. only streaming quantization path remains. fix dense model budget plan bug: gate_proj/up_proj excluded from sensitivity boost due to dead MLP asymmetry reference.
Configuration menu - View commit details
-
Copy full SHA for 127b08e - Browse repository at this point
Copy the full SHA 127b08eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 489b4d0 - Browse repository at this point
Copy the full SHA 489b4d0View commit details -
refactor: replace turboquant with mlx-vlm import instead of custom im…
…plementation old turboquant had MSE-only codec with C++ metal extensions and was 350x slower than SDPA. now imports from mlx-vlm which has multi-codec support (MSE, Prod, Polar, Split), fractional bits (3.5), and 3 optimized decode paths via inline mx.fast.metal_kernel. only BatchTurboQuantKVCache is implemented locally for omlx's continuous batching scheduler. re-enables turboquant in admin UI.
Configuration menu - View commit details
-
Copy full SHA for b627377 - Browse repository at this point
Copy the full SHA b627377View commit details -
fix: apply EXIF orientation transpose in image loading
omlx was missing ImageOps.exif_transpose() that mlx-vlm applies when loading images. phone photos with EXIF rotation tags were passed to the vision model in the wrong orientation, causing inaccurate recognition.
Configuration menu - View commit details
-
Copy full SHA for 5086d1c - Browse repository at this point
Copy the full SHA 5086d1cView commit details -
fix: turboquant immediate quantization and eval hang workaround
quantize KV cache immediately during prefill (not deferred to first decode token) so peak memory is reduced throughout the entire inference. use concat-based state growth instead of in-place write. eval logits instead of cache states to work around mx.eval NamedTuple bulk traversal hang. disable prefill_attention Metal kernels for D>128 (hangs on Qwen3.5-27B with head_dim=256), fall back to dequantize+SDPA. Qwen3.5-27B-4bit pp32768/tg128: baseline: peak 21.82GB, TTFT 85.0s, tg 30.6 tok/s TQ 3-bit: peak 21.11GB, TTFT 92.8s, tg 16.3 tok/s
Configuration menu - View commit details
-
Copy full SHA for 26ecd0b - Browse repository at this point
Copy the full SHA 26ecd0bView commit details -
fix: turboquant batch-quantize decode tokens and hybrid attention
batch-quantize every 32 decode tokens instead of per-token quantization to improve GPU utilization (rotation matmul on 32 tokens vs 1 token). decode attention uses hybrid approach: TQ Metal kernels for quantized old tokens + standard dot product for buffered fp16 recent tokens. also fixes prefill_attention: disable for chunked prefill because mlx-vlm's value kernel unrolls n_repeats*L at compile time, hanging the Metal shader compiler for large L (e.g. 2048). Qwen3.5-27B-4bit pp32768/tg128 (3-bit TQ): before: TTFT 92.8s, tg 16.3 tok/s, peak 21.11GB after: TTFT 92.8s, tg 19.2 tok/s, peak 21.11GB (baseline without TQ: tg 30.6 tok/s, peak 21.82GB)
Configuration menu - View commit details
-
Copy full SHA for c2c8d93 - Browse repository at this point
Copy the full SHA c2c8d93View commit details -
Configuration menu - View commit details
-
Copy full SHA for 98849a4 - Browse repository at this point
Copy the full SHA 98849a4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ce8a4d - Browse repository at this point
Copy the full SHA 5ce8a4dView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.3.1...v0.3.2