-
Notifications
You must be signed in to change notification settings - Fork 924
Comparing changes
Open a pull request
base repository: jundot/omlx
base: v0.2.20
head repository: jundot/omlx
compare: v0.2.21
- 17 commits
- 41 files changed
- 3 contributors
Commits on Mar 23, 2026
-
Configuration menu - View commit details
-
Copy full SHA for a9334ae - Browse repository at this point
Copy the full SHA a9334aeView commit details -
fix: update TemplateResponse calls for Starlette 1.0 compatibility (#351
) * fix: update TemplateResponse calls for Starlette 1.0 compatibility Starlette 1.0.0 changed the TemplateResponse signature from TemplateResponse(name, context) to TemplateResponse(request, name, context). The old positional API passed the context dict as the `name` parameter, causing Jinja2's LRUCache to receive an unhashable dict as a cache key: TypeError: unhashable type: 'dict' This broke the admin dashboard (login, dashboard, chat pages) with HTTP 500, while the /v1/ inference API was unaffected. - Update 3 TemplateResponse calls in admin/routes.py - Pass explicit empty context dict for consistency - Add TestLoginPage and TestDashboardPage test classes - Update existing TestChatPageApiKeyInjection assertions * address review: bump fastapi floor + improve login_page test assertion - Bump fastapi>=0.100.0 to >=0.108.0 to match the Starlette 1.0 TemplateResponse(request, name, context) signature requirement (Codex P1 review) - Use assert_called_once_with in login_page test to verify all args including context dict (Gemini review)
Configuration menu - View commit details
-
Copy full SHA for ca890e5 - Browse repository at this point
Copy the full SHA ca890e5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 293c33a - Browse repository at this point
Copy the full SHA 293c33aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6cc709c - Browse repository at this point
Copy the full SHA 6cc709cView commit details
Commits on Mar 24, 2026
-
Configuration menu - View commit details
-
Copy full SHA for 66b099b - Browse repository at this point
Copy the full SHA 66b099bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3cd6f9e - Browse repository at this point
Copy the full SHA 3cd6f9eView commit details -
fix: resolve download popup menu z-index issue in accuracy benchmark (#…
…370) animate-fade-in-up ended with transform: translateY(0) which creates a stacking context per card, trapping popup z-50 inside. changed to transform: none which doesn't create a stacking context.
Configuration menu - View commit details
-
Copy full SHA for 7bc1402 - Browse repository at this point
Copy the full SHA 7bc1402View commit details -
Configuration menu - View commit details
-
Copy full SHA for 86e14a4 - Browse repository at this point
Copy the full SHA 86e14a4View commit details -
fix: add generation memory guard and Metal cache cleanup on failure (#…
…372) - defer scheduling new requests when active memory exceeds soft limit during generation to prevent Metal allocation failures - clear Metal buffer cache in fail_all_requests() to reclaim fragmented memory after batch generation errors - rename "Prefill memory guard" to "Memory guard" in admin UI and move toggle to top of Resource Management section
Configuration menu - View commit details
-
Copy full SHA for 57dc615 - Browse repository at this point
Copy the full SHA 57dc615View commit details -
fix: skip LoRA adapters in model discovery and admin downloads (#356)
detect adapter_config.json to filter out LoRA/PEFT adapters that oMLX cannot load. show warning badge instead of download button in admin UI.
Configuration menu - View commit details
-
Copy full SHA for 8558d7f - Browse repository at this point
Copy the full SHA 8558d7fView commit details -
fix: use monotonic _offset for BatchRotatingKVCache in VLM proxy (#353)
_IntOffsetCacheProxy returned _idx for RoPE offset, but BatchRotatingKVCache._idx wraps at max_size (e.g. 1024 -> 0). after the sliding window fills, RoPE positions reset to 0 causing gibberish output on Gemma3 and other sliding-window VLM models. use _offset (monotonic, never wraps) instead. only BatchRotatingKVCache has _offset, so BatchKVCache and other cache types are unaffected.
Configuration menu - View commit details
-
Copy full SHA for 5fdfd38 - Browse repository at this point
Copy the full SHA 5fdfd38View commit details -
Configuration menu - View commit details
-
Copy full SHA for e150eb0 - Browse repository at this point
Copy the full SHA e150eb0View commit details -
fix: use authoritative mx.array offset in VLM cache proxy
replace _idx/_offset shortcuts with direct offset[0].item() extraction. _idx wraps at max_size (continuous generation), _offset diverges after merge() which sets it to buffer size instead of actual token count (SSD cache restore). the mx.array offset is always correct. also add Gemma3-12B-QAT to boundary cache consistency tests.
Configuration menu - View commit details
-
Copy full SHA for 6f3a33d - Browse repository at this point
Copy the full SHA 6f3a33dView commit details
Commits on Mar 25, 2026
-
oq: GPTQ-based enhanced quantization with batched MoE expert processing
- implement GPTQ column-wise error compensation for all quantizable weights - batched expert GPTQ processes 256 experts simultaneously (15x faster) - shared Hessian across experts in each MoE layer - sensitivity budget assigns per-tensor bits before GPTQ optimization - fix float32 norm weights to bfloat16 for mlx-lm inference parity - add eos_token_id from generation_config.json - add Step-3.5 MoE support (moe.*_proj pattern) - update admin UI: Enhanced Quantization(+) with GPTQ description - update docs/oQ_Quantization.md for GPTQ-based pipeline - clean up legacy equalization code
Configuration menu - View commit details
-
Copy full SHA for 228a4d9 - Browse repository at this point
Copy the full SHA 228a4d9View commit details -
feat: TurboQuant KV cache compression with fused Flash Attention
codebook-quantized KV cache that reduces memory ~50-70% during decode with near-lossless quality. lazy quantization keeps prefill at fp16 speed, then compresses at decode start. core: - TurboQuantKVCache + BatchTurboQuantKVCache with full batch lifecycle - 2-pass fused Flash Attention Metal kernel (no dequant during decode) - boundary-based quantization (19x faster than argmin) - batch decode_attention via same fused kernel (B>1 grid dispatch) integration: - attention patch with VLM _IntOffsetCacheProxy unwrap - prefix cache: save quantized blocks to SSD, dequant to KVCache on restore for merge compatibility. meta_state stores (offset, bits, seed) - type_registry: TurboQuantKVCache recognized as sliceable - admin UI: turboquant toggle with 3-bit/4-bit selection
Configuration menu - View commit details
-
Copy full SHA for 8c2696e - Browse repository at this point
Copy the full SHA 8c2696eView commit details -
Configuration menu - View commit details
-
Copy full SHA for ea09fc6 - Browse repository at this point
Copy the full SHA ea09fc6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 710ef39 - Browse repository at this point
Copy the full SHA 710ef39View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.2.20...v0.2.21