Top Free-Tier OpenRouter Models for Agentic
Coding Tasks
1. Qwen3 Coder (free variant) – A 480B-parameter MoE model (35B active) specialized for agentic
code workflows 1 . It is explicitly optimized for function calling, tool use, and long-context
reasoning over code repositories 1 . In benchmarks it leads open-source models: topping
CodeForces ELO, BFCL, and LiveCodeBench v5, and even beating Moonshot’s Kimi K2 and DeepSeek
V3 on SWE-Bench 2 . In practical terms it can generate full functions/classes, debug and refactor
code across large contexts, and handle multi-turn planning with tools 3 4 . Strengths: state-of-
the-art code-generation and agentic reasoning (comparable to Claude/GPT-4-level) 2 , huge context
window (262K tokens) for long sessions, open-source. Limitations: very large/heavy model (requires
significant resources or via API), still new so ecosystem integration may lag, occasional instability on
trivial tasks.
2. DeepSeek R1 (free variant) – A 671B-parameter open model (37B active) designed for reasoning
and code, with no cost 5 . It matches “OpenAI o1” (their latest code-specialized model) in
performance 6 , suggesting excellent coding ability. It is fully open-source and MIT‑licensed 6 ,
making it ideal for experimentation. Strengths: very strong reasoning/coding skills, free to use, long
163K-token context. Limitations: slower inference (huge size), fewer fine-tunings for code compared
to Qwen3-Coder (it’s a more general agent/chat model), moderate community support.
3. Z.AI GLM-4.5 Air (free) – A 35B-parameter Mixture-of-Experts model (air variant) purpose-built for
agentic use 7 . It supports a “thinking mode” for advanced reasoning and tool use, and a “non-
thinking mode” for fast replies 7 . GLM-4.5 Air inherits GLM-4.5’s strong coding and reasoning skills
and adds hybrid inference control. Strengths: excellent multi-step reasoning and coding support
with built-in tool-use control 7 , very large context (131K tokens), free and relatively efficient.
Limitations: still smaller than MoE giants (35B vs hundreds of B), so raw code fluency may lag
Qwen3; somewhat new model with evolving support.
4. DeepCoder-14B Preview (free) – A 14B-parameter code-specialized model (96K context) fine-tuned
via RL from DeepSeek/Qwen bases 8 . It’s explicitly optimized for long-context program synthesis.
Notably, it scores 60.6% on LiveCodeBench v5 – among the highest for open models 9 .
Strengths: very strong at pure code generation and self-debugging on benchmark problems 10 ,
long context (96K) for multi-step coding tasks. Limitations: smaller size (14B) means less general
reasoning, shorter context than giants, so it’s a specialist rather than a full assistant; may struggle
with non-code queries.
5. Dolphin 3.0 Mistral-24B (free) – A 24B-model from the Dolphin series (32K context) trained as a
general-purpose agentic assistant 11 . It is tuned for coding, math, function-calling and multi-turn
agentic interactions 11 . Strengths: versatile generalist (more like ChatGPT/Gemini) with solid
coding ability, open-source and easy to run locally. Limitations: smaller and simpler than the top
MoE models; e.g. in benchmarks an 8B Dolphin lagged Llama3.1 on MMLU 12 . Context is only 32K,
1
so it may run out on very long chains. Still, it can often out-perform same-size instruction models on
reasoning.
6. Qwen2.5 72B Instruct (free) – A 72B-instruction model (32K context) building on Qwen2, with
special expert fine-tuning for code and math 13 . It offers stronger coding/math knowledge than its
predecessors, and supports long inputs (up to 128K). Strengths: very capable for code and math
tasks compared to typical 70B models 13 , fully free. Limitations: behind the latest Qwen3/Coder
generation in raw code performance; only 32K context in practice on OpenRouter; less optimized for
agentic function-calling than Qwen3 or GLM.
Each of the above models is freely accessible via OpenRouter’s free tier. In practice, Qwen3-Coder leads for
pure code generation and agentic debugging (thanks to massive scale and RL tuning) 2 , while DeepSeek
R1 and GLM-4.5-Air excel at multi-step reasoning with code and tools 7 6 . DeepCoder-14B is a strong
code specialist in automated coding benchmarks 10 . Dolphin 3.0 and Qwen2.5-72B offer balanced, easy-
to-run solutions with good but slightly lower coding prowess. (Note: context window limits and model sizes
vary – Qwen3 and GLM Air support very long contexts, whereas Dolphin and Qwen2.5 top out around 32K.)
Sources: Model specs and performance claims from OpenRouter’s model pages and AI analysis 1 10 2
7 .
1 OpenRouter
https://openrouter.ai/qwen/qwen3-coder:free
2 3 Qwen3-Coder: Alibaba’s Game-Changing Open-Source Agentic AI Coding Model | by Cogni Down
4
Under | Jul, 2025 | Medium
https://medium.com/@cognidownunder/qwen3-coder-alibabas-game-changing-open-source-agentic-ai-coding-
model-3cf34dcc8d7a
5 6 OpenRouter
https://openrouter.ai/deepseek/deepseek-r1:free
7 OpenRouter
https://openrouter.ai/z-ai/glm-4.5-air:free
8 9 10 OpenRouter
https://openrouter.ai/agentica-org/deepcoder-14b-preview:free
11 OpenRouter
https://openrouter.ai/cognitivecomputations/dolphin3.0-mistral-24b:free
12 Dolphin 3.0 Released (Llama 3.1 + 3.2 + Qwen 2.5) : r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1hufsy4/dolphin_30_released_llama_31_32_qwen_25/
13 OpenRouter
https://openrouter.ai/qwen/qwen-2.5-72b-instruct:free