Enterprise Java

Docker-Powered Spring AI

Spring AI makes it easy to bring large language model (LLM) capabilities into Spring Boot applications. While cloud-based APIs like OpenAI are widely used, many developers want the flexibility and control of running models locally. The Docker Model Runner provides exactly that: a standardized way to launch models in a container and expose them through an OpenAI-compatible API. Let us delve into understanding Spring AI Docker Model Runner.

1. What is Spring AI?

Spring AI is part of the Spring ecosystem, providing abstractions for AI model integration. It supports multiple providers like OpenAI, Azure OpenAI, Hugging Face, and local model servers. It handles prompt management, request formatting, and response parsing.

1.1 What is Docker Model Runner?

The Docker Model Runner is a containerized service for running AI models. It ships with support for popular open-source models such as Mistral, LLaMA, and Phi, and exposes them through a REST interface that mimics the OpenAI API. By using containers, we get consistent environments, easier deployments, and the ability to run models locally or on servers without complex setup.

  • Isolation: Models run in their own container environment.
  • Portability: Works across machines and operating systems.
  • Privacy: Prompts and responses stay within your infrastructure.

2. Prerequisites

Before starting with Spring AI and integrating an OpenAI-compatible model, ensure your development environment meets the following requirements:

  • Java 21 or newer: Spring Boot 3.x requires Java 21 or above. Make sure your JAVA_HOME is set correctly and your IDE or build tool (Maven/Gradle) is using the proper JDK version.
  • Docker Desktop 4.40+ installed: Docker is required if you want to run local AI models or proxy servers in containers. Ensure Docker Desktop is running and properly configured with sufficient CPU and memory resources for model execution.
  • A basic Spring Boot project: You can generate a starter project from Spring Initializr with at least the spring-boot-starter-web dependency. This provides the necessary web framework for creating REST endpoints.

3. Running a Model with Docker

Docker Model Runner (DMR) comes bundled by default with Docker Desktop 4.40+ on Apple Silicon (macOS). If it’s inadvertently disabled—or if you prefer using the CLI—you can quickly re-enable it:

docker desktop enable model-runner

By default, DMR is accessible only via the Docker socket or inside containers using the hostname model-runner.docker.internal. If you’d like your host processes—for instance, an OpenAI SDK or local app—to communicate over TCP, enable host-side access with:

docker desktop enable model-runner --tcp 12434

This will expose DMR at http://localhost:12434 on your machine. Once enabled, you can use familiar CLI commands to manage and run models as if you’re dealing with regular Docker tasks: pulling, listing, running, and removing models is straightforward and intuitive. For example, a lightweight model—ideal for machines with limited resources—is ai/smollm2:360M-Q4_K_M. The tag structure typically follows:

{model}:{parameters}-{quantization}

Note: After enabling the Docker Model Runner, pull the model using docker model pull ai/smollm2:360M-Q4_K_M command. You can verify that the model is available locally by running docker model list.

4. Code Example

In this section, we will demonstrate how to set up a basic Spring AI project that integrates with OpenAI-compatible models. We will go through adding dependencies, configuring the application, and writing a simple code example to call the model.

4.1 Add Dependencies (pom.xml)

Spring AI provides seamless integration with OpenAI-compatible models via its spring-ai-openai-spring-boot-starter. Additionally, we include the standard Spring Boot web starter for REST endpoints.

<dependency>
	<groupId>org.springframework.ai</groupId>
	<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
	<version>latest_jar_version</version>
</dependency>
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-web</artifactId>
	<version>latest_jar_version</version>
</dependency>

After adding these dependencies, run mvn clean install to ensure that the libraries are downloaded and available for your project.

4.2 Add Configuration

To connect Spring AI with your OpenAI-compatible model, you need to provide configuration properties. These include the API endpoint, your API key, and model-specific options such as the chat model to use.

spring:
	ai:
		openai:
		  # Base URL of your OpenAI-compatible service
		  base-url: http://localhost:12434/v1
		  # API key for authentication
		  api-key: dummy-key
		  chat:
			options:
			  # Specify the model to use for chat interactions
			  model: ai/smollm2:360M-Q4_K_M

After adding this configuration, Spring Boot will automatically wire the necessary beans to interact with the specified model using the Spring AI framework. Make note that the API key is required by Spring AI, but for local Model Runner it is not used—you can set any dummy value.

4.3 Create a Controller Class

// ChatController.java
package jcg.spring.ai.tutorial;

import org.springframework.ai.chat.client.ChatClient;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

@RequestMapping("/chat")
public class ChatController {

  private final ChatClient chatClient;

  public ChatController(ChatClient chatClient) {
    this.chatClient = chatClient;
  }

  @GetMapping
  public String ask(@RequestParam String prompt) {
    return chatClient.call(prompt).getResult().getOutput().getContent();
  }
}

4.3.1 Code Explanation

The ChatController class defines a Spring REST controller that handles chat requests at the /chat endpoint. It injects a ChatClient instance through the constructor for interacting with an OpenAI-compatible model. The ask method maps to a GET request and accepts a prompt parameter from the request. When called, it passes the prompt to the chatClient.call() method, retrieves the response from the Docker Model Runner, and returns the generated content from the model’s output as the HTTP response. This setup allows clients to send chat prompts via a simple HTTP GET request and receive AI-generated responses directly.

4.4 Code Run and Demo

After setting up your dependencies, configuration, and controller, you are ready to run the Spring Boot application and test the chat integration with the Docker Model Runner. Start the application: In your project root, run the following command to launch the Spring Boot app:

mvn spring-boot:run

This will start the embedded Tomcat server (default on port 8080) and automatically wire the ChatClient to the configured Docker Model Runner. Send a test prompt: You can now test the /chat endpoint using a web browser, curl, or a REST client like Postman. For example, using curl:

curl -G "http://localhost:8080/chat" --data-urlencode "prompt=Hello, how are you?"

The endpoint will forward the prompt to the Docker Model Runner, which processes it with the selected model (ai/smollm2:360M-Q4_K_M) and returns the generated output. The response will be returned as plain text from the model’s output.

Hello! I'm a language model running locally in Docker. How can I assist you today?

5. Conclusion

The Docker Model Runner is a powerful option for experimenting with and deploying LLMs locally. Combined with Spring AI, it enables developers to build intelligent applications that are portable, private, and production-ready. Whether you are prototyping or building enterprise systems, this combination offers a modern, flexible way to harness AI inside the Spring ecosystem.

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button