Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: jundot/omlx
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.2.20
Choose a base ref
...
head repository: jundot/omlx
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.2.21
Choose a head ref
  • 17 commits
  • 41 files changed
  • 3 contributors

Commits on Mar 23, 2026

  1. Configuration menu
    Copy the full SHA
    a9334ae View commit details
    Browse the repository at this point in the history
  2. fix: update TemplateResponse calls for Starlette 1.0 compatibility (#351

    )
    
    * fix: update TemplateResponse calls for Starlette 1.0 compatibility
    
    Starlette 1.0.0 changed the TemplateResponse signature from
    TemplateResponse(name, context) to TemplateResponse(request, name, context).
    
    The old positional API passed the context dict as the `name` parameter,
    causing Jinja2's LRUCache to receive an unhashable dict as a cache key:
    
        TypeError: unhashable type: 'dict'
    
    This broke the admin dashboard (login, dashboard, chat pages) with
    HTTP 500, while the /v1/ inference API was unaffected.
    
    - Update 3 TemplateResponse calls in admin/routes.py
    - Pass explicit empty context dict for consistency
    - Add TestLoginPage and TestDashboardPage test classes
    - Update existing TestChatPageApiKeyInjection assertions
    
    * address review: bump fastapi floor + improve login_page test assertion
    
    - Bump fastapi>=0.100.0 to >=0.108.0 to match the Starlette 1.0
      TemplateResponse(request, name, context) signature requirement
      (Codex P1 review)
    - Use assert_called_once_with in login_page test to verify all args
      including context dict (Gemini review)
    Regis-RCR authored Mar 23, 2026
    Configuration menu
    Copy the full SHA
    ca890e5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    293c33a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6cc709c View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2026

  1. Configuration menu
    Copy the full SHA
    66b099b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3cd6f9e View commit details
    Browse the repository at this point in the history
  3. fix: resolve download popup menu z-index issue in accuracy benchmark (#…

    …370)
    
    animate-fade-in-up ended with transform: translateY(0) which creates
    a stacking context per card, trapping popup z-50 inside. changed to
    transform: none which doesn't create a stacking context.
    jundot committed Mar 24, 2026
    Configuration menu
    Copy the full SHA
    7bc1402 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    86e14a4 View commit details
    Browse the repository at this point in the history
  5. fix: add generation memory guard and Metal cache cleanup on failure (#…

    …372)
    
    - defer scheduling new requests when active memory exceeds soft limit
      during generation to prevent Metal allocation failures
    - clear Metal buffer cache in fail_all_requests() to reclaim fragmented
      memory after batch generation errors
    - rename "Prefill memory guard" to "Memory guard" in admin UI and move
      toggle to top of Resource Management section
    jundot committed Mar 24, 2026
    Configuration menu
    Copy the full SHA
    57dc615 View commit details
    Browse the repository at this point in the history
  6. fix: skip LoRA adapters in model discovery and admin downloads (#356)

    detect adapter_config.json to filter out LoRA/PEFT adapters that oMLX
    cannot load. show warning badge instead of download button in admin UI.
    jundot committed Mar 24, 2026
    Configuration menu
    Copy the full SHA
    8558d7f View commit details
    Browse the repository at this point in the history
  7. fix: use monotonic _offset for BatchRotatingKVCache in VLM proxy (#353)

    _IntOffsetCacheProxy returned _idx for RoPE offset, but
    BatchRotatingKVCache._idx wraps at max_size (e.g. 1024 -> 0).
    after the sliding window fills, RoPE positions reset to 0 causing
    gibberish output on Gemma3 and other sliding-window VLM models.
    
    use _offset (monotonic, never wraps) instead. only
    BatchRotatingKVCache has _offset, so BatchKVCache and other cache
    types are unaffected.
    jundot committed Mar 24, 2026
    Configuration menu
    Copy the full SHA
    5fdfd38 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    e150eb0 View commit details
    Browse the repository at this point in the history
  9. fix: use authoritative mx.array offset in VLM cache proxy

    replace _idx/_offset shortcuts with direct offset[0].item() extraction.
    _idx wraps at max_size (continuous generation), _offset diverges after
    merge() which sets it to buffer size instead of actual token count
    (SSD cache restore). the mx.array offset is always correct.
    
    also add Gemma3-12B-QAT to boundary cache consistency tests.
    jundot committed Mar 24, 2026
    Configuration menu
    Copy the full SHA
    6f3a33d View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2026

  1. oq: GPTQ-based enhanced quantization with batched MoE expert processing

    - implement GPTQ column-wise error compensation for all quantizable weights
    - batched expert GPTQ processes 256 experts simultaneously (15x faster)
    - shared Hessian across experts in each MoE layer
    - sensitivity budget assigns per-tensor bits before GPTQ optimization
    - fix float32 norm weights to bfloat16 for mlx-lm inference parity
    - add eos_token_id from generation_config.json
    - add Step-3.5 MoE support (moe.*_proj pattern)
    - update admin UI: Enhanced Quantization(+) with GPTQ description
    - update docs/oQ_Quantization.md for GPTQ-based pipeline
    - clean up legacy equalization code
    jundot committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    228a4d9 View commit details
    Browse the repository at this point in the history
  2. feat: TurboQuant KV cache compression with fused Flash Attention

    codebook-quantized KV cache that reduces memory ~50-70% during decode
    with near-lossless quality. lazy quantization keeps prefill at fp16
    speed, then compresses at decode start.
    
    core:
    - TurboQuantKVCache + BatchTurboQuantKVCache with full batch lifecycle
    - 2-pass fused Flash Attention Metal kernel (no dequant during decode)
    - boundary-based quantization (19x faster than argmin)
    - batch decode_attention via same fused kernel (B>1 grid dispatch)
    
    integration:
    - attention patch with VLM _IntOffsetCacheProxy unwrap
    - prefix cache: save quantized blocks to SSD, dequant to KVCache on
      restore for merge compatibility. meta_state stores (offset, bits, seed)
    - type_registry: TurboQuantKVCache recognized as sliceable
    - admin UI: turboquant toggle with 3-bit/4-bit selection
    jundot committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    8c2696e View commit details
    Browse the repository at this point in the history
  3. bump version to 0.2.21

    jundot committed Mar 25, 2026
    Configuration menu
    Copy the full SHA
    ea09fc6 View commit details
    Browse the repository at this point in the history
  4. Add files via upload

    jundot authored Mar 25, 2026
    Configuration menu
    Copy the full SHA
    710ef39 View commit details
    Browse the repository at this point in the history
Loading