`cleverhack.com`

AI Coding Landscape GitHub repo ⇒ AI Coding Models ↴

`AI Coding Landscape`

July 2025 (Updated February 2026)

Note: Since everything is moving so fast, I wanted a create a knowledge framework about AI coding models and the associated agent, IDE, and software tooling ecosystem used for AI-assisted coding and/or vibe coding. This page continues to evolve as a market view of what is being mentioned and is an obvious ongoing work in progress.

`Listing AI coding agents, CLIs, IDEs, app builders, open source versions, devtools, and leaderboards`

`AI Coding Agents/CLI Tools`

OpenAI Codex - Cloud coding agent toolkit

GitHub Copilot - Pair-programming assistant

Claude Code - Anthropic terminal agent, bring Opus 4.5 right to your terminal

Gemini Code Assist - Google AI coding assistant

Jules - Google Asynchronous Coding Agent

Cognition - Devin - An autonomous AI software engineer that can write, run and test code

Amazon Q Developer - AWS code-gen & refactor

Cursor AI - Agent baked into Cursor IDE

Goose - Model + agent API

Amp - Sourcegraph coding agent (CLI / VS Code)

Reflection AI - Asimov - Enterprise code research agent

Conductor - Run a bunch of Claude Codes in parallel

Scout - Calls itself the most curious coding and research agent

Blackbox AI - New Autonomous AI Coding Agent

Forge Code - An AI software engineering agent that runs in your terminal

Factory - Delegate software development tasks to agents called Droids

Replit Agent - Set up and create apps from scratch, works with any framework

JetBrains Junie - Your smart coding agent

Slate - A purpose built agent designed to work with you for long and hard coding tasks

GitHub Copilot CLI - The power of GitHub Copilot coding agent directly to your terminal

Codebuff - Works in your terminal to help you write and deeply understand your code

CTO.new - Completely free AI code agent

Kimi-CLI - A new CLI agent that can help you with your software development tasks and terminal operations

`Open Source AI Coding Agents/CLI Tools`

Aider - Terminal pair-programming

Continue - IDE extensions + CLI

Cline - Autonomous IDE agent

Roo Code - Cline fork, VS Code extension

Kilo Code - AI coding agent for VS Code and JetBrains

Gemini CLI - An open-source AI agent for Google Gemini

OpenAI Codex CLI - Open‑source command‑line agent for OpenAI

OpenHands - Multi-tool coding agent

Qwen Code - A command-line AI workflow tool for Qwen3-Coder

Ruler - Central AI agent rule registry

OpenCode - OSS terminal assistant

Vibe Kanban - Orchestrate multiple agents

Charm - A charming terminal agent, your new coding bestie

Goose - An open source, extensible AI agent that goes beyond code suggestions

DeepCode - Transforms research papers and natural language into production-ready code

Mistral Vibe CLI - Mistral Vibe is a command-line coding assistant powered by Mistral's models

`Desktop IDEs`

Visual Studio Code

IntelliJ IDEA / PyCharm / WebStorm

Xcode

Eclipse, NetBeans

Atom - Atom community fork

Blackbox IDE

`Cloud & AI‑Powered IDEs`

Google Antigravity - Agentic development platform, evolving the IDE into the agent-first era

Cursor - AI-first VS Code fork

Windsurf - Agentic IDE, advanced AI coding assistant for developers and enterprises

Zed - High-performance Rust editor with AI chat

Amp - VS Code Extension

Trae - ByteDance AI IDE

Augment Code - Developer AI platform that helps you understand code, debug issues, and ship faster

Warp - An agentic development environment

Kiro - Helps you do your best work by bringing structure to AI coding with spec-driven development

`AI App Builders`

Bolt - Browser-based AI app builder

Lovable - Chat-to-app builder

Replit - Cloud IDE w/ Ghostwriter

v0.dev - Vercel text-to-UI generator

Mocha - YC-backed no-code app builder

Nectry - Responsible vibe coding for the enterprise

Reflex - From prompt to production, build and deploy Python apps

Superblocks - Build secure internal apps with AI

vybe - Build internal apps 10X faster

Emergent - YC-backed, build ambitious apps with agentic vibe-coding

orchids v2 - YC-backed, the worlds first AI Full Stack Engineer

Same - YC-backed, build fullstack web apps by prompting

Aura - Generate beautiful designs in seconds and export to HTML or Figma

21st.dev - Build products that reflect the team's own taste

Base44 - Lets you build fully-functional apps in minutes with just your words

VibeFlow - YC backed, transform your AI-generated frontend mockups into fully functional applications

Blink.new - The world's first vibe coding platform that builds agentic AI apps

a0 - YC backed, ship mobile apps to the App Store and Google Play with AI

Anything - Create powerful apps & websites by chatting with AI

Rocket - Think It. Type It. Launch It.

Google Build - Build your ideas with Gemini

Variant - Gives your ideas room to grow...to branch, remix, and become what they're meant to be

sleek.design - Design mobile apps in minutes

`Mobile AI App Builders`

Rork - Builds complete, cross-platform mobile apps using AI and React Native

Vibecode - Create native apps in seconds with AI

bitrig - Build apps for your phone, on your phone

Spielwork - The Tiktok for vibecoded mini games!

Gizmo - A new way to make playful, personal software—right from your phone

Hivemind - The fastest & easiest way to chat & code with any AI in one app

Bloom - YC backed, go from idea to native mobile app on your phone without writing a single line of code

Vibe Code Go - YC backed, code from your phone, a mobile app for software engineers

`Open Source AI App Builders`

Hugging Face DeepSite - Access the most simple and powerful AI Vibe Code Editor to create your next project

Dyad - A local, open-source AI app builder

Open Lovable - Clone and recreate any website as a modern React app in seconds

bolt.diy - Bolt.new OSS version, AI-powered full-stack web dev for NodeJS based apps, choose the LLM you use for each prompt

app.build - An open-source AI agent that builds full-stack apps

ToolJet - An open-source low-code framework to build and deploy internal tools

Adorable - Another open source Lovable version

Vercel - OSS Vibe Coding Platform

Cloudflare VibeSDK - Run an entire vibe coding platform end-to-end, with just one click

`Other Useful AI DevTools`

Ollama - Chat & build with open models

LM Studio - Run gpt-oss, Qwen, Gemma, DeepSeek on your computer

Open WebUI - Self-hosted AI platform designed to operate entirely offline

SillyTavern - A locally installed UI for text, image, and voice LLMs

Unsloth - An open-source framework for LLM fine-tuning and reinforcement learning

n8n - Flexible AI workflow automation for technical teams

Firecrawl - Turn websites into LLM-ready data

Agents.md - A simple, open format for guiding coding agents, used by over 20k open-source projects

Vercel AI Gateway - A gateway to access hundreds of models with zero markup on tokens (including BYOK)

OpenRouter - A unified API providing access to hundreds of AI models through a single endpoint

Fabric - An open-source modular system for solving specific problems using crowdsourced AI prompts that can be used anywhere

Vibetunnel - VibeTunnel proxies your terminals right into the browser, so you can vibe-code anywhere

Anannas - Single API to access any LLM - Seamlessly connect to multiple models through a single gateway with failproof routing, cost control, and instant usage insights

CodeRabbit - AI code reviews - cut code review time & bugs in half

Giga AI - Giga's context engineering improves quality and understanding — so your AI works right the first time, and you build faster

Gas Town - Multi-agent orchestrator for Claude Code. Track work with convoys; sling to agents

`Coding Benchmarks & Leaderboards`

Kilo Code blog - Benchmarking GPT-5.1 vs Gemini 3.0 vs Opus 4.5 across 3 Coding Tasks - November 2025

SWE-Bench Pro (Commercial Dataset) - A new benchmark designed to provide a rigorous and realistic evaluation of AI agents for software engineering

SWE-Bench Pro (Public Dataset) - Designed to provide a rigorous and realistic evaluation of AI agents for software engineering; developed to address several challenges: data contamination, limited task diversity, oversimplified problems, and unreliable and irreproducible testing

[Deprecated] SWE-bench Verified - SWE-bench evaluates LLM performance on real world software issues collected from GitHub (the "Verified" subset is a specific version of the dataset designed to be more reliable)

SWE-bench - SWE-bench evaluates LLM performance on real world software issues collected from GitHub

SWE-bench Multilingual - 300 curated SWE-bench style tasks from 42 repositories representing 9 programming languages

SWE-rebench - A Continuously Evolving and Decontaminated Benchmark for Software Engineering LLMs

Aider - Aider polyglot coding leaderboard

OpenRouter - Model, Market Share, Use Case Categories, and App Rankings

ARC-AGI-2 - Stress testing the efficiency and capability of state-of-the-art AI reasoning systems

[email protected] - A benchmark measuring the capabilities of AI agents in a terminal environment

Terminal-Bench - A benchmark measuring the capabilities of AI agents in a terminal environment

OSWorld - Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

PR Arena - Software engineering agents head to head

Multi-SWE-bench - A Multilingual Benchmark for Issue Resolving

SWE-DEV - Evaluating and Training Autonomous Feature-Driven Software Development

LiveCodeBench Pro - A benchmark composed of problems from Codeforces, ICPC, and IOI that are continuously updated to reduce the likelihood of data contamination

LiveCodeBench - Holistic and Contamination Free Evaluation of Large Language Models for Code

BigCodeArena - A human-in-the-loop platform for evaluating code through execution

Modu Merge Rate Leaderboard - Real-world success rates: Ranking top coding agents by their pull request merge performance on Modu

OpenBench Coding - An open-source framework for standardized, reproducible benchmarking of large language models (LLMs)

Context-Bench - A benchmark for agentic context engineering

Repo Bench - Measuring large context reasoning, file editing precision, and instruction adherence

Vending-Bench 2 - Measuring AI model performance on running a business over long time horizons

τ-bench / τ2-bench - Benchmarking AI agents in collaborative real-world scenarios

Live-SWE-agent - Can Software Engineering Agents to Self-Evolve on the Fly?

MCP Atlas - Evaluates how well language models handle real-world tool use through the Model Context Protocol (MCP)

CORE-Bench Hard - The agent is given the codebase of a published scientific paper and must install all libraries and dependencies, run the code, and read through the output and figures to answer questions about the paper

APEX-Agents - The AI Productivity Index for Agents (APEX-Agents) measures whether frontier AI agents can execute long-horizon, cross-application tasks across three jobs in professional services

`Developer Surveys`

The state of AI coding in 2025: Adoption, proficiency, and transformation - The Modern Software Developer, December 2025

AI in Practice Survey 2025 - Theory Ventures, December 2025

`Coding Model Timeline (foundation / open‑weight / frontier)`

Noteworthy releases, some entries may be updated model versions or model families.

`February 2026`

GPT-5.3-Codex-Spark - A research preview of OpenAI's first model designed for real-time, ultra-fast coding. Powered by Cerebras Wafer Scale Engine 3, it delivers more than 1,000 tokens per second with near-instant responsiveness, optimized for interactive work like making targeted logic edits or refining interfaces. While smaller than the full GPT-5.3-Codex, it demonstrates strong agentic performance on SWE-Bench Pro and Terminal-Bench 2.0 (58.4% accuracy) in a fraction of the time. Features a 128k context window and a lightweight working style that prioritizes minimal, high-speed edits to keep developers in a tight interactive loop.

Zhipu AI GLM-5 - A flagship Mixture-of-Experts (MoE) model with 745B total parameters (44B active) designed for "Agentic Engineering." It achieves state-of-the-art performance for open-source models, narrowing the gap with Claude Opus 4.5 in complex system refactoring and deep debugging. Features a 200k token context window and is released under a permissive MIT license. Notably trained independently of US hardware, utilizing Huawei Ascend infrastructure and the MindSpore framework.

MiniMax 2.5 - A peak-performance model optimized specifically for end-to-end developer workflows, including multi-file edits and test-validated repairs. It leads industry leaderboards with an 80.2% score on SWE-Bench and operates 37% faster than comparable frontier models. Supports a 200k context window and a specialized "thinking mode" for complex logic. Designed for high-efficiency agent loops, it offers a significantly lower cost-to-performance ratio for long-running autonomous sessions.

Claude Opus 4.6 - Anthropic's smartest model with improved coding skills including better planning, sustained agentic tasks, operation in larger codebases, and enhanced code review and debugging to catch its own mistakes. First Opus-class model with 1M token context window (beta). Applies capabilities to everyday work tasks including financial analyses, research, and document/spreadsheet/presentation creation. Achieves state-of-the-art performance on Terminal-Bench 2.0 (agentic coding), Humanity's Last Exam (multidisciplinary reasoning), GDPval-AA (knowledge work tasks), and BrowseComp (information retrieval). Maintains industry-leading safety profile with low rates of misaligned behavior

GPT-5.3-Codex - OpenAI's most capable agentic coding model, combining the coding performance of GPT-5.2-Codex with GPT-5.2's reasoning capabilities in a single model that's 25% faster. Handles long-running tasks involving research, tool use, and complex execution. You can steer and interact with it mid-task without losing context. First OpenAI model to help create itself

`January 2026`

SERA-32B - Ai2, the first model in Ai2's Open Coding Agents series, a state-of-the-art open-source coding agent that achieves 49.5% on SWE-bench Verified, matching the performance of frontier open models like Devstral-Small-2 (24B) and larger models like GLM-4.5-Air (110B); trained using Soft Verified Generation (SVG), a simple and efficient method that is 26x cheaper than reinforcement learning and 57x cheaper than previous synthetic data methods to reach equivalent performance with a total cost for data generation and training of approximately $2,000 (40 GPU-days)

Kimi K2.5 - Moonshot AI, Open-Source Visual Agentic Intelligence. Global SOTA on Agentic Benchmarks: HLE full set (50.2%), BrowseComp (74.9%); Open-source SOTA on Vision and Coding: MMMU Pro (78.5%), VideoMMMU (86.6%), SWE-bench Verified (76.8%); Code with Taste: turn chats, images & videos into aesthetic websites with expressive motion; Agent Swarm (Beta): self-directed agents working in parallel, at scale. Up to 100 sub-agents, 1,500 tool calls, 4.5× faster compared with single-agent setup

GLM-4.7-Flash - Z.ai, a local coding and agentic assistant setting a new standard for the 30B class, balancing high performance with efficiency, making it the perfect lightweight deployment option; also recommended for creative writing, translation, long-context tasks, and roleplay

`December 2025`

M2.1 - MiniMax, a new open-source AI model with 10 billion activated parameters (230 billion total) democratizing high-performance agentic capabilities, scoring 74.0 on SWE-bench Verified and 91.5 on VIBE-Web benchmarks. It excels in multi-language programming (Rust, Java, Go, C++, TypeScript, etc.), UI development, and complex real-world office workflows while offering full transparency and accessibility through both HuggingFace weights and API access

GLM-4.7 - Z.ai, optimized for AI coding assistance, this updated model shows major improvements over GLM-4.6 across coding tasks (including 5.8% gain on SWE-bench and 12.9% on multilingual coding), UI/webpage generation, tool usage, and complex reasoning with better performance in chat, creative writing, and role-play scenarios

GPT-5.2-Codex - OpenAI, the most advanced agentic coding model yet for complex, real-world software engineering. An optimized version of GPT‑5.2 ⁠for agentic coding in Codex, including improvements on long-horizon work through context compaction, stronger performance on large code changes like refactors and migrations, improved performance in Windows environments, and significantly stronger cybersecurity capabilities

Gemini 3 Flash - Google, delivers high-speed, pro-grade reasoning and outperforms even the Pro model in coding benchmarks, making it an ideal tool for low-latency agentic workflows and complex multimodal tasks like video analysis and real-time data extraction

GPT‑5.2 Thinking - OpenAI, sets a new state of the art of 55.6% on SWE-Bench Pro, a rigorous evaluation of real-world software engineering. This model can more reliably debug production code, implement feature requests, refactor large codebases, and ship fixes end-to-end with less manual intervention

Devstral 2 - Mistral AI, our next-generation coding model family available in two sizes: Devstral 2 (123B) and Devstral Small 2 (24B). Devstral sets the open state-of-the-art for code agents. Devstral 2 ships under a modified MIT license, while Devstral Small 2 uses Apache 2.0. Both are open-source and permissively licensed to accelerate distributed intelligence

rnj-1-instruct - Essential AI, trained from scratch and optimized for code and STEM with capabilities on par with SOTA open-weight models, performs well across a range of programming languages and boasts strong agentic capabilities (e.g., inside agentic frameworks like mini-SWE-agent), while also excelling at tool-calling

`November 2025`

Claude Opus 4.5 - Anthropic, intelligent, efficient, and the best model in the world for coding, agents, and computer use, also meaningfully better at everyday tasks like deep research and working with slides and spreadsheets

GPT-5.1-Codex-Max - OpenAI, an update to our foundational reasoning model, which is trained on agentic tasks across software engineering, math, research, and more, faster, more intelligent, and more token-efficient

Gemini 3 - Google, our most intelligent model that can help bring any idea to life, delivers unparalleled results across every major AI benchmark compared to previous versions, also surpasses 2.5 Pro at coding, mastering both agentic workflows and complex zero-shot tasks

Doubao-Seed-Code - ByteDance Volcengine, achieve breakthroughs in performance, price, and migration cost, and deeply integrated with the TRAE development environment

GPT-5-Codex-Mini - OpenAI, allows roughly 4x more usage than GPT-5-Codex, at a slight capability tradeoff due to the more compact model

Mercury Coder - Inception Labs, dLLM optimized to accelerate coding workflows, streaming, tool use, and structured output with 128K context window

`October 2025`

Composer - Cursor, 4x faster than similarly intelligent models and built for low-latency agentic coding

SWE-1.5 - Windsurf Cognition, a fast-agent frontier-size model with hundreds of billions of parameters that achieves near-SOTA coding performance, 6x faster than Haiku 4.5 and 13x faster than Sonnet 4.5

CoDA-1.7B - Salesforce AI Research, diffusion-based language model designed for powerful code generation and bidirectional context understanding

KAT-Dev-72B-Exp - Kawaipilot, an open-source 72B-parameter model for software engineering tasks, achieves 74.6% accuracy on SWE-Bench Verified when evaluated strictly with the SWE-agent scaffold

`September 2025`

Code World Model (CWM) - AI at Meta, CWM is an LLM for code generation and reasoning that has been trained to better represent and reason how code and commands affect the state of a program or system

DeepSeek-V3.2-Exp - DeepSeek, experimental sparse-attention upgrade that halves inference cost while retaining strong code-generation and long-context reasoning

GLM-4.6 - Z.ai, features a longer context window, superior coding performance, advanced reasoning, more capable agents, and refined writing versus GLM-4.5

Claude Sonnet 4.5 - Anthropic, the strongest model for building complex agents, the best model at using computers, it shows substantial gains on tests of reasoning and math

Qwen3-Max-Instruct - Alibaba Cloud, the official release further elevates its capabilities — particularly in coding and agent performance

GPT‑5-Codex - OpenAI, a version of GPT‑5 further optimized for agentic coding in Codex and trained with a focus on real-world software engineering work

Kimi K2-Instruct-0905 - Moonshot AI, updated SOTA model with improved agentic and frontend capabilities and increased context length

`August 2025`

GPT-5 - OpenAI, flagship model

GPT-5-mini - OpenAI, fast/cost efficient

GPT-5-nano - OpenAI, faster/cost efficient

Claude Opus 4.1 - Anthropic, a drop-in replacement for Opus 4

Mistral Medium 3.1 - Mistral AI, aka Mistral-Medium-2508 - enterprise-grade model excels in coding tasks

Grok Code Fast 1 - xAI, a speedy and economical reasoning model that excels at agentic coding, efficient code generation, and execution

`July 2025`

Qwen3-Coder - Alibaba Cloud, agentic code model

Qwen3-Coder-Flash - Alibaba Cloud, streamlined non thinking agentic code model

Kimi K2 - Moonshot AI, 1 T-param MoE

GLM-4.5 - Z.ai, An open-source LLM designed for intelligent agents

Codestral 25.08 - Mistral AI, code model for high-precision fill-in-the-middle (FIM) completion

Devstral Medium 2507 - Mistral × All Hands AI, high-quality and cost-effective model

Devstral Small 1.1 2507 - Mistral × All Hands AI, agentic model

Grok 4 - xAI, trained with reinforcement learning for native tool use, including code interpreters, making it highly capable for coding and advanced reasoning tasks

`June 2025`

Gemini 2.5 Pro - Google DeepMind, flagship model

Gemini 2.5 Flash - Google DeepMind, fast/cost efficient with thinking capabilities

`May 2025`

Claude Opus 4 - Anthropic, pushes the frontier in coding, agentic search, and creative writing

Claude Sonnet 4 - Anthropic, improves on Claude Sonnet 3.7 across a variety of areas, especially coding

DeepSeek-R1-0528 - DeepSeek, OSS reasoning model

`April 2025`

o3 - OpenAI, preview reasoning model

o4-mini - OpenAI, compact model

GPT-4.1 - OpenAI, flagship model with 1M token context window

Llama 4 Maverick - Meta, code-tuned model

Llama 4 Scout - Meta, open-weight model

Mellum - JetBrains, 4-B param OSS model

`March 2025`

DeepSeek-V3-0324 - DeepSeek, improved V3 version

`February 2025`

Gemini 2.0 Flash - Google DeepMind, multimodal for high-volume high-frequency tasks

Claude 3.7 Sonnet - Anthropic, first hybrid reasoning model and state-of-the art for coding

Grok 3 - xAI, coding capable model

`Menu`	`About The Author`	`More AI Writing`
`⇒ HOME` `⇒ ABOUT`	`Joy Larkin is a technologist in Silicon Valley. She likes robots and is excited for Superintelligence. LinkedIn: /in/joylarkin ◦◦◦ Twitter: @joy`	`The Challenges of Building Agentic AI For Business The Urgency of Open Source AI`

`Menu`

`About The Author`

`More AI Writing`

⇒ HOME

⇒ ABOUT

Joy Larkin is a technologist in Silicon Valley. She likes robots and is excited for Superintelligence. LinkedIn: /in/joylarkin ◦◦◦ Twitter: @joy

The Challenges of Building Agentic AI For Business The Urgency of Open Source AI