Releases: ollama/ollama
Releases · ollama/ollama
v0.15.0
What's Changed
- Clean up the manifest and modelpath by @pdevine in #13807
- x/imagegen: remove qwen_image and qwen_image_edit models by @jmorganca in #13827
- cmd: handle Enter key pressed during model loading, render multiline better by @ParthSareen in #13839
- x/imagegen: replace memory estimation with actual weight size by @jmorganca in #13848
- cmd:
ollama configcommand to help configure integrations to use Ollama by @ParthSareen in #13712 - x/imagegen: add image edit capabilities by @jmorganca in #13846
Full Changelog: v0.14.3...v0.15.0-rc1
v0.14.3
- Z-Image Turbo: 6 billion parameter text-to-image model from Alibaba’s Tongyi Lab. It generates high-quality photorealistic images.
- Flux.2 Klein: Black Forest Labs’ fastest image-generation models to date.
New models
- GLM-4.7-Flash: As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
- LFM2.5-1.2B-Thinking: LFM2.5 is a new family of hybrid models designed for on-device deployment.
What's Changed
- Fixed issue where Ollama's macOS app would interrupt system shutdown
- Fixed
ollama createandollama showcommands for experimental models - The
/api/generateAPI can now be used for image generation - Fixed minor issues in Nemotron-3-Nano tool parsing
- Fixed issue where removing an image generation model would cause it to first load
- Fixed issue where
ollama rmwould only stop the first model in the list if it were running
Full Changelog: v0.14.2...v0.14.3
v0.14.2
New models
- TranslateGemma: A new collection of open translation models built on Gemma 3, helping people communicate across 55 languages.
What's Changed
- Shift + Enter (or Ctrl + j) will now enter a newline in Ollama's CLI
- Improve
/v1/responsesAPI to better confirm to OpenResponses specification
New Contributors
- @yuhongsun96 made their first contribution in #13135
- @koaning made their first contribution in #13326
Full Changelog: v0.14.1...v0.14.2
v0.14.1
Image generation models (experimental)
Experimental image generation models are available for macOS and Linux (CUDA) in Ollama:
Available models
ollama run x/z-image-turbo
Note:
xis a username on ollama.com where experimental models are uploaded
More models coming soon:
- Qwen-Image-2512
- Qwen-Image-Edit-2511
- GLM-Image
What's Changed
- fix macOS auto-update signature verification failure
New Contributors
- @joshxfi made their first contribution in #13711
- @maternion made their first contribution in #13709
Full Changelog: v0.14.0...v0.14.1
v0.14.0
What's Changed
ollama run --experimentalCLI will now open a new Ollama CLI that includes an agent loop and thebashtool- Anthropic API compatibility: support for the
/v1/messagesAPI - A new
REQUIREScommand for theModelfileallows declaring which version of Ollama is required for the model - For older models, Ollama will avoid an integer underflow on low VRAM systems during memory estimation
- More accurate VRAM measurements for AMD iGPUs
- Ollama's app will now highlight swift source code
- An error will now return when embeddings return
NaNor-Inf - Ollama's Linux install bundles files now use
zstcompression - New experimental support for image generation models, powered by MLX
New Contributors
- @Vallabh-1504 made their first contribution in #13550
- @majiayu000 made their first contribution in #13596
- @harrykiselev made their first contribution in #13615
Full Changelog: v0.13.5...v0.14.0-rc2
v0.13.5
New Models
- Google's FunctionGemma a specialized version of Google's Gemma 3 270M model fine-tuned explicitly for function calling.
What's Changed
bertarchitecture models now run on Ollama's engine- Added built-in renderer & tool parsing capabilities for DeepSeek-V3.1
- Fixed issue where nested properties in tools may not have been rendered properly
New Contributors
- @familom made their first contribution in #13220
- @nathannewyen made their first contribution in #13469
Full Changelog: v0.13.4...v0.13.5
v0.13.4
New Models
- Nemotron 3 Nano: A new Standard for Efficient, Open, and Intelligent Agentic Models
- Olmo 3 and Olmo 3.1: A series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.
What's Changed
- Enable Flash Attention automatically for models by default
- Fixed handling of long contexts with Gemma 3 models
- Fixed issue that would occur with Gemma 3 QAT models or other models imported with the Gemma 3 architecture
New Contributors
Full Changelog: v0.13.3...v0.13.4-rc0
v0.13.3
New models
- Devstral-Small-2: 24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.
- rnj-1: Rnj-1 is a family of 8B parameter open-weight, dense models trained from scratch by Essential AI, optimized for code and STEM with capabilities on par with SOTA open-weight models.
- nomic-embed-text-v2: nomic-embed-text-v2-moe is a multilingual MoE text embedding model that excels at multilingual retrieval.
What's Changed
- Improved truncation logic when using
/api/embedand/v1/embeddings - Extend Gemma 3 architecture to support rnj-1 model
- Fix error that would occur when running qwen2.5vl with image input
Full Changelog: v0.13.2...v0.13.3
v0.13.2
New models
- Qwen3-Next: The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed.
What's Changed
- Flash attention is now enabled by default for vision models such as
mistral-3,gemma3,qwen3-vland more. This improves memory utilization and performance when providing images as input. - Fixed GPU detection on multi-GPU CUDA machines
- Fixed issue where
deepseek-v3.1would always think even with thinking is disabled in Ollama's app
New Contributors
- @chengcheng84 made their first contribution in #13265
- @nathan-hook made their first contribution in #13256
Full Changelog: v0.13.1...v0.13.2
v0.13.1
New models
- Ministral-3: The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware.
- Mistral-Large-3: A general-purpose multimodal mixture-of-experts model for production-grade tasks and enterprise workloads.
What's Changed
nomic-embed-textwill now use Ollama's engine by default- Tool calling support for
cogito-v2.1 - Fixed issues with CUDA VRAM discovery
- Fixed link to docs in Ollama's app
- Fixed issue where models would be evicted on CPU-only systems
- Ollama will now better render errors instead of showing
Unmarshal:errors - Fixed issue where CUDA GPUs would fail to be detected with older GPUs
- Added thinking and tool parsing for cogito-v2.1
New Contributors
- @EntropyYue made their first contribution in #13237
- @kokes made their first contribution in #13231
Full Changelog: v0.13.0...v0.13.1