Development Roadmap (2026 H1)

Here is the development roadmap for H1 2026. We will pin this roadmap in Issues, and most of our subsequent work will be updated in this roadmap within Issues. In MLLM's documentation, we will archive each version of the roadmap and provide some outlooks. Contributions and feedback are welcome.

# Focus

- pymllm for embodied robots/agents on Jetson Orin/Thor.
- mllm's arm and NPU will still going on, supporting more models.
- NPU AOT shape bucketing optimization

# Model coverage

- Gemm3n(with support of AltUp, Embedding and SWA, cpu backend)
- Qwen3-VL 2B(pymllm cuda backend)
- Qwen3.5 0.8B/4B/9B(pymllm cuda backend, cpu, npu backend)

# NPU Backend

- shape bucketing
- sliding window attention optimization
- Qwen3 VL & Qwen2 VL's ViT part(with fixed image size)

# Kernels

- GDN kernel for Qwen3.5(arm backend, cpu)
- marlin kernel for pymllm (mllm-kernel)
- GDN kernel for pymllm(mllm-kernel, for SM80 chips)

# Pymllm

- Radix Cache corretness
- Qwen3 MoE and Qwen3.5 Optimization
- Optimizing CPU busy loop

# Server

- mllm-server: Provides an OpenAI-compatible API, using the mllm library as the inference backend.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development Roadmap (2026 H1) #651

Focus

Model coverage

NPU Backend

Kernels

Pymllm

Server

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Development Roadmap (2026 H1) #651

Description

Focus

Model coverage

NPU Backend

Kernels

Pymllm

Server

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions