Releases · ollama/ollama

@pdevine

What's Changed

Clean up the manifest and modelpath by @pdevine in #13807
x/imagegen: remove qwen_image and qwen_image_edit models by @jmorganca in #13827
cmd: handle Enter key pressed during model loading, render multiline better by @ParthSareen in #13839
x/imagegen: replace memory estimation with actual weight size by @jmorganca in #13848
cmd: ollama config command to help configure integrations to use Ollama by @ParthSareen in #13712
x/imagegen: add image edit capabilities by @jmorganca in #13846

Full Changelog: v0.14.3...v0.15.0-rc1

Ollama screenshot 2026-01-20 at 23 41 54@2x

Z-Image Turbo: 6 billion parameter text-to-image model from Alibaba’s Tongyi Lab. It generates high-quality photorealistic images.
Flux.2 Klein: Black Forest Labs’ fastest image-generation models to date.

New models

GLM-4.7-Flash: As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
LFM2.5-1.2B-Thinking: LFM2.5 is a new family of hybrid models designed for on-device deployment.

What's Changed

Fixed issue where Ollama's macOS app would interrupt system shutdown
Fixed ollama create and ollama show commands for experimental models
The /api/generate API can now be used for image generation
Fixed minor issues in Nemotron-3-Nano tool parsing
Fixed issue where removing an image generation model would cause it to first load
Fixed issue where ollama rm would only stop the first model in the list if it were running

Full Changelog: v0.14.2...v0.14.3

@yuhongsun96

New models

TranslateGemma: A new collection of open translation models built on Gemma 3, helping people communicate across 55 languages.

What's Changed

Shift + Enter (or Ctrl + j) will now enter a newline in Ollama's CLI
Improve /v1/responses API to better confirm to OpenResponses specification

New Contributors

@yuhongsun96 made their first contribution in #13135
@koaning made their first contribution in #13326

Full Changelog: v0.14.1...v0.14.2

@joshxfi

Image generation models (experimental)

Experimental image generation models are available for macOS and Linux (CUDA) in Ollama:

Available models

Z-Image-Turbo

ollama run x/z-image-turbo

Note: x is a username on ollama.com where experimental models are uploaded

More models coming soon:

Qwen-Image-2512
Qwen-Image-Edit-2511
GLM-Image

What's Changed

fix macOS auto-update signature verification failure

New Contributors

@joshxfi made their first contribution in #13711
@maternion made their first contribution in #13709

Full Changelog: v0.14.0...v0.14.1

@Vallabh-1504

What's Changed

ollama run --experimental CLI will now open a new Ollama CLI that includes an agent loop and the bash tool
Anthropic API compatibility: support for the /v1/messages API
A new REQUIRES command for the Modelfile allows declaring which version of Ollama is required for the model
For older models, Ollama will avoid an integer underflow on low VRAM systems during memory estimation
More accurate VRAM measurements for AMD iGPUs
Ollama's app will now highlight swift source code
An error will now return when embeddings return NaN or -Inf
Ollama's Linux install bundles files now use zst compression
New experimental support for image generation models, powered by MLX

New Contributors

@Vallabh-1504 made their first contribution in #13550
@majiayu000 made their first contribution in #13596
@harrykiselev made their first contribution in #13615

Full Changelog: v0.13.5...v0.14.0-rc2

@familom

New Models

Google's FunctionGemma a specialized version of Google's Gemma 3 270M model fine-tuned explicitly for function calling.

What's Changed

bert architecture models now run on Ollama's engine
Added built-in renderer & tool parsing capabilities for DeepSeek-V3.1
Fixed issue where nested properties in tools may not have been rendered properly

New Contributors

@familom made their first contribution in #13220
@nathannewyen made their first contribution in #13469

Full Changelog: v0.13.4...v0.13.5

@familom

New Models

Nemotron 3 Nano: A new Standard for Efficient, Open, and Intelligent Agentic Models
Olmo 3 and Olmo 3.1: A series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.

What's Changed

Enable Flash Attention automatically for models by default
Fixed handling of long contexts with Gemma 3 models
Fixed issue that would occur with Gemma 3 QAT models or other models imported with the Gemma 3 architecture

New Contributors

@familom made their first contribution in #13220

Full Changelog: v0.13.3...v0.13.4-rc0

New models

Devstral-Small-2: 24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.
rnj-1: Rnj-1 is a family of 8B parameter open-weight, dense models trained from scratch by Essential AI, optimized for code and STEM with capabilities on par with SOTA open-weight models.
nomic-embed-text-v2: nomic-embed-text-v2-moe is a multilingual MoE text embedding model that excels at multilingual retrieval.

What's Changed

Improved truncation logic when using /api/embed and /v1/embeddings
Extend Gemma 3 architecture to support rnj-1 model
Fix error that would occur when running qwen2.5vl with image input

Full Changelog: v0.13.2...v0.13.3

@chengcheng84

New models

Qwen3-Next: The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.

What's Changed

Flash attention is now enabled by default for vision models such as mistral-3, gemma3, qwen3-vl and more. This improves memory utilization and performance when providing images as input.
Fixed GPU detection on multi-GPU CUDA machines
Fixed issue where deepseek-v3.1 would always think even with thinking is disabled in Ollama's app

New Contributors

@chengcheng84 made their first contribution in #13265
@nathan-hook made their first contribution in #13256

Full Changelog: v0.13.1...v0.13.2

@EntropyYue

New models

Ministral-3: The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.
Mistral-Large-3: A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.

What's Changed

nomic-embed-text will now use Ollama's engine by default
Tool calling support for cogito-v2.1
Fixed issues with CUDA VRAM discovery
Fixed link to docs in Ollama's app
Fixed issue where models would be evicted on CPU-only systems
Ollama will now better render errors instead of showing Unmarshal: errors
Fixed issue where CUDA GPUs would fail to be detected with older GPUs
Added thinking and tool parsing for cogito-v2.1

New Contributors

@EntropyYue made their first contribution in #13237
@kokes made their first contribution in #13231

Full Changelog: v0.13.0...v0.13.1

Releases: ollama/ollama

v0.15.0

What's Changed

Contributors

Uh oh!

v0.14.3

New models

What's Changed

Uh oh!

v0.14.2

New models

What's Changed

New Contributors

Contributors

Uh oh!

v0.14.1

Image generation models (experimental)

Available models

What's Changed

New Contributors

Contributors

Uh oh!

v0.14.0

What's Changed

New Contributors

Contributors

Uh oh!

v0.13.5

New Models

What's Changed

New Contributors

Contributors

Uh oh!

v0.13.4

New Models

What's Changed

New Contributors

Contributors

Uh oh!

v0.13.3

New models

What's Changed

Uh oh!

v0.13.2

New models

What's Changed

New Contributors

Contributors

Uh oh!

v0.13.1

New models

What's Changed

New Contributors

Contributors

Uh oh!