-
Notifications
You must be signed in to change notification settings - Fork 184
Development Roadmap (2026 H1) #651
Copy link
Copy link
Open
Description
Here is the development roadmap for H1 2026. We will pin this roadmap in Issues, and most of our subsequent work will be updated in this roadmap within Issues. In MLLM's documentation, we will archive each version of the roadmap and provide some outlooks. Contributions and feedback are welcome.
Focus
- pymllm for embodied robots/agents on Jetson Orin/Thor.
- mllm's arm and NPU will still going on, supporting more models.
- NPU AOT shape bucketing optimization
Model coverage
- Gemm3n(with support of AltUp, Embedding and SWA, cpu backend)
- Qwen3-VL 2B(pymllm cuda backend)
- Qwen3.5 0.8B/4B/9B(pymllm cuda backend, cpu, npu backend)
NPU Backend
- shape bucketing
- sliding window attention optimization
- Qwen3 VL & Qwen2 VL's ViT part(with fixed image size)
Kernels
- GDN kernel for Qwen3.5(arm backend, cpu)
- marlin kernel for pymllm (mllm-kernel)
- GDN kernel for pymllm(mllm-kernel, for SM80 chips)
Pymllm
- Radix Cache corretness
- Qwen3 MoE and Qwen3.5 Optimization
- Optimizing CPU busy loop
Server
- mllm-server: Provides an OpenAI-compatible API, using the mllm library as the inference backend.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels