Enterprise Java

An Introduction to Using simple-openai in Java

Modern Java applications increasingly utilise large language models for tasks such as text generation and structured data extraction; however, provider-specific SDKs can make switching vendors difficult. The simple-openai library solves this by offering a unified, type-safe Java client for OpenAI-compatible APIs across multiple providers. In this article, we build small Java apps using simple openai with Google Gemini and a free API key, following the same patterns that also apply to OpenAI and other compatible platforms.

1. What is simple-openai?

simple-openai is a lightweight, community-driven Java HTTP client built around the OpenAI-compatible API specification. Instead of working with raw endpoints and JSON payloads, developers interact with strongly typed request and response objects that map directly to LLM features such as chat and structured outputs.

Although the API surface follows OpenAI’s standard format, the library is not limited to a single vendor. With the right configuration, the same code can communicate with Google Gemini, Azure OpenAI, Mistral, DeepSeek, Anyscale, and other compatible providers. This design keeps most application code independent of any specific model vendor, making it easier to adapt to pricing changes, regional availability, or new model releases over time.

1.1 Key Capabilities of the Library

The goal of simple-openai is not only to wrap HTTP calls, but also to provide broad coverage of modern LLM features in a consistent Java API. It supports a wide range of functionality typically required in production systems, including text generation, embeddings, multimodal inputs, and structured data extraction.

Beyond basic requests, the library also focuses on developer experience. It supports non-blocking calls using Java’s CompletableFuture, making it suitable for reactive or highly concurrent applications. For scenarios where responses need to be consumed incrementally, such as chat UIs or streaming dashboards, streaming endpoints are also exposed through convenient abstractions.

Another important feature is the ability to integrate custom business logic into LLM workflows using function calling and structured outputs. Instead of treating the model as a black box that only returns free-form text, applications can guide the model to return predictable JSON objects or trigger application-side functions as part of a conversation.

In addition, the library supports multiple OpenAI-compatible providers. While not every provider exposes the same feature set, the shared API model means that switching vendors often requires only configuration changes rather than rewriting integration code.

2. Project Setup and Dependencies

Before interacting with any LLM provider, we need a basic Java project with the required dependencies.

        <dependency>
            <groupId>io.github.sashirestela</groupId>
            <artifactId>simple-openai</artifactId>
            <version>3.22.2</version>
        </dependency>

This configuration adds the simple-openai client. The client handles HTTP communication internally, so no extra networking libraries are required.

3. Configuring the Client for Google Gemini

To keep the examples easy to run, we will connect to Google Gemini using its OpenAI-compatible gateway and a free API key.

Tip
To get a free Google Gemini API key, visit Google AI Studio and sign in with your Google account. Click on “Get API key” in the left sidebar, then click “Create API key” to generate a key in a new or existing Google Cloud project.

Environment Variable

export GEMINI_API_KEY=your_free_gemini_api_key_here

Client Factory

public class OpenAIClientFactory {

    private OpenAIClientFactory() {
        throw new IllegalStateException("Utility class");
    }

    public static SimpleOpenAIGeminiGoogle createGeminiClient() {
        String apiKey = System.getenv("GEMINI_API_KEY");

        return SimpleOpenAIGeminiGoogle.builder()
                .apiKey(apiKey)
                .build();
    }
}

This factory class centralizes the creation of the Gemini client and keeps provider-specific configuration out of the rest of the application. It reads the API key from the GEMINI_API_KEY environment variable to avoid hard-coding sensitive credentials in source code.

The SimpleOpenAIGeminiGoogle.builder() method applies the builder pattern, allowing the client to be configured in a clear and extensible way before it is created. By exposing a single createGeminiClient method, the application can obtain a fully configured client wherever it is needed, while making it easy to switch providers or adjust settings later without changing the calling code.

4. Sending a Chat Prompt

Conversational text generation is the most common entry point into LLM integration. In this example, we send a short prompt to the model and display its response in the console.

public class ChatDemo {

    public static void main(String[] args) throws ExecutionException, InterruptedException {

        SimpleOpenAIGeminiGoogle client = OpenAIClientFactory.createGeminiClient();

        ChatRequest request = ChatRequest.builder()
                .model("gemini-2.5-flash")
                .messages(List.of(
                        ChatMessage.SystemMessage.of("You are a helpful Java tutor."),
                        ChatMessage.UserMessage.of("Explain what a record is in Java.")
                ))
                .temperature(0.7)
                .build();

        var response = client.chatCompletions().create(request);
        var answer = response.join();

        System.out.println("Model response:");
        System.out.println(answer.firstContent());
    }
}

This class defines a simple console application that demonstrates how to send a chat prompt to a Gemini model using the simple-openai client. The first step inside main is obtaining a configured client instance from OpenAIClientFactory.createGeminiClient(). By delegating client creation to a factory, the application keeps authentication details and provider-specific configuration in one place. This also makes it easier to change providers or update settings without modifying the rest of the code that uses the client.

Next, a ChatRequest is built using the builder pattern. The request specifies the model name, in this case gemini-2.5-flash, and provides a list of messages that form the conversation context. The temperature parameter controls how creative or deterministic the response should be, with moderate randomness set to 0.7. The build() call finalizes the request object.

The request is then sent using client.chatCompletions().create(request), which returns an asynchronous result. Calling join() waits for the model to finish generating a response and retrieves the completed value.

Finally, the code outputs the first piece of generated content from the model using answer.firstContent(). This extracts the main text produced by the assistant from the structured response object, making it easy to display within the application.

Sample Output

Model response:
Okay, let's break down what a Java `record` is.

## What is a Java Record?

A `record` in Java is a special kind of **class** designed specifically for **modeling immutable data**. Its primary purpose is to dramatically reduce the boilerplate code traditionally required for classes that are simply data carriers.

Think of it as a concise way to declare **data-only classes** where the state is transparent and immutable.

.....

The Gemini model generates all responses in accordance with the prompt provided in the chat request. The content of the response explains what a Java record is, focusing on its purpose as an immutable data carrier.

5. Streaming Chat Responses

While a standard chat completion waits for the full response before returning, streaming allows the application to receive tokens as the model generates them. This is useful for building responsive user interfaces or any experience where partial output should be displayed immediately instead of waiting for the entire answer.

With simple-openai, streaming follows the same request structure as normal chat completion, but the response is consumed as a stream of events rather than a single completed result.

public class ChatStreamingDemo {

    public static void main(String[] args) {

        var client = OpenAIClientFactory.createGeminiClient();

        var request = ChatRequest.builder()
                .model("gemini-2.5-flash")
                .messages(List.of(
                        ChatMessage.SystemMessage.of("You are a helpful Java tutor."),
                        ChatMessage.UserMessage.of("Explain what a record is in Java.")
                ))
                .temperature(0.7)
                .build();

        var response = client.chatCompletions().createStream(request);
        var answer = response.join();

        answer.filter(chatResponse -> !chatResponse.getChoices().isEmpty())
                .map(Chat::firstContent)
                .forEach(System.out::print);

        System.out.println("");
    }
}

The call to client.chatCompletions().createStream(request) sends the request using a streaming-capable API. Instead of returning a single completed response, it returns an asynchronous stream of partial chat responses. Calling join() waits until the stream is available and returns an object that can be processed using standard stream style operations.

The next lines process each chunk of the streamed response. The filter step ignores any events that do not contain generated choices. The map(Chat::firstContent) step extracts the text content from each partial response, and forEach(System.out::print) immediately prints each piece to the console. This causes the model’s answer to appear gradually, giving the effect of real-time text generation.

6. Enforcing Structured Output

Free-form text is not always suitable for automation. When applications need predictable results, structured outputs allow the model to return JSON that matches a predefined schema.

Data Record

public record BookRecommendation(String title, String author, int publicationYear, String reason) {

}

Structured Output Example

public class StructuredDemo {

    public static void main(String[] args) {

        var client = OpenAIClientFactory.createGeminiClient();

        var request = ChatRequest.builder()
                .model("gemini-2.5-flash")
                .message(SystemMessage.of("You are a software engineering expert. Recommend one classic software engineering book"))
                .message(UserMessage.of("Recommend a classic software engineering book."))
                .responseFormat(ResponseFormat.jsonSchema(JsonSchema.builder()
                        .name("Book Recommendation")
                        .schemaClass(BookRecommendation.class)
                        .build()))
                .build();

        var response = client.chatCompletions().createStream(request).join();

        response.filter(chatResponse -> !chatResponse.getChoices().isEmpty())
                .map(Chat::firstContent)
                .forEach(System.out::print);

        System.out.println("");
    }
}

This example demonstrates how to request structured output from the model using a JSON schema, ensuring that the response follows a predefined format instead of free-form text. The key part of the request is the responseFormat configuration. By using ResponseFormat.jsonSchema(...) and supplying a schema generated from the BookRecommendation class, the application instructs the model to return JSON that matches the fields of that class. This enables reliable, machine-readable responses that can be safely mapped to Java objects instead of relying on fragile text parsing.

Sample Output

When you run the StructuredDemo application, the console displays a JSON object that matches the structure of the BookRecommendation schema. The final output will look similar to the following:

{"author": "Frederick P. Brooks Jr.", 
 "publicationYear": 1975, 
 "reason": "This book articulates timeless principles about project management, team dynamics, and the inherent difficulties of software development. It introduces concepts like 'Brooks's Law' (adding manpower to a late software project makes it later) and the importance of conceptual integrity. Its insights remain highly relevant for anyone involved in large-scale software projects, making it a foundational text.", 
 "title": "The Mythical Man-Month"}

This output is not free-form text. Instead, it is a structured JSON document that follows the exact fields defined in the BookRecommendation class: title, author, publicationYear, and reason. Because the request included a JSON schema through ResponseFormat.jsonSchema(...), the model was guided to format its response strictly according to that structure.

7. Keeping Conversation State with Interactive Input

For interactive applications, conversation state becomes important because each new user input should build on everything that has already been said. Instead of sending isolated prompts, the application keeps a running history of messages and sends the full conversation to the model on every request. This allows the assistant to respond in a way that is consistent with earlier questions and answers, creating a natural multi-turn chat experience.

Below is an example that adapts the earlier ChatDemo into an interactive program where users can type multiple questions, receive replies, and continue the conversation until they type exit.

public class InteractiveChatDemo {

    public static void main(String[] args) {

        var client = OpenAIClientFactory.createGeminiClient();

        List<ChatMessage> conversation = new ArrayList<>();
        conversation.add(ChatMessage.SystemMessage.of(
                "You are a helpful Java tutor. Answer clearly and briefly."
        ));

        try (Scanner scanner = new Scanner(System.in)) {

            System.out.println("Start chatting with the assistant (type 'exit' to quit).");

            while (true) {
                System.out.print("You: ");
                String input = scanner.nextLine();

                if (input == null || input.isBlank()) {
                    continue;
                }
                if ("exit".equalsIgnoreCase(input.trim())) {
                    break;
                }

                conversation.add(ChatMessage.UserMessage.of(input));

                ChatRequest.ChatRequestBuilder builder
                        = ChatRequest.builder().model("gemini-2.5-flash");

                for (ChatMessage message : conversation) {
                    builder.message(message);
                }

                ChatRequest chatRequest = builder.build();

                var future = client.chatCompletions().create(chatRequest);
                var chat = future.join();

                String reply = chat.firstContent().toString();

                System.out.println("Assistant: " + reply);

                conversation.add(ChatMessage.AssistantMessage.of(reply));
            }
        }
    }
}

This example starts by creating a Gemini client using the same factory method used in earlier examples. A mutable list conversation is used to store the full conversation. The first message added is a system message that defines the assistant’s role as a helpful Java tutor. This message is included in every request, ensuring the model continues to follow the same behaviour throughout the session.

The program then enters a loop that reads user input from the console using a Scanner. Each time the user types a message, the input is validated to ensure it is not empty and checked for the exit command. If the user chooses to continue, the input is added to the conversation history as a user message.

To build the request, a ChatRequestBuilder is created and every message in the conversation list is added to it in order. This is the key step that preserves conversation state, because the model receives not only the latest question but also all previous exchanges, allowing it to generate context-aware responses. The request is sent using client.chatCompletions().create(chatRequest), which returns an asynchronous result.

Finally, the assistant’s reply is added back into the conversation list as an assistant message. This ensures that the next request includes both what the user said and what the assistant answered, enabling true multi-turn conversations. This same pattern can be extended to support streaming responses, logging, or persistent chat sessions stored in databases for longer-lived interactions.

Example Output

Start chatting with the assistant (type 'exit' to quit).
You: What is a record in Java?
Assistant: A Java record is a special type of **class** designed to act as a **transparent carrier for immutable data**.

It automatically provides:
*   A compact syntax for declaring data-only classes.
*   An implicit, canonical constructor.
*   Accessor methods (getters) for its components.
*   Implementations of `equals()`, `hashCode()`, and `toString()` based on its components.

Records significantly reduce boilerplate code for data objects that were traditionally written as plain classes with fields, constructors, and overridden methods. They were introduced in Java 16.
You: When should I use a record instead of a class?
Assistant: You should use a record instead of a class primarily when:

1.  **You need a simple immutable data carrier.** Records are perfect for Data Transfer Objects (DTOs), value objects, or tuples where the main purpose is to hold a fixed set of data.
2.  **The data should not change after creation.** Records are inherently immutable. If you need mutable objects, use a regular class.
3.  **You want to eliminate boilerplate code.** Records automatically provide the constructor, accessors, `equals()`, `hashCode()`, and `toString()` methods, drastically reducing the code you have to write for data-centric classes.
4.  **You are representing a "nominal tuple" or "product type."** Meaning, an object whose identity is defined solely by the values of its components.

**In essence, use a record when your class is "just data" and you want that data to be immutable and easily comparable.** Use a regular class when you need mutable state, complex behavior, inheritance, or more control over the object's lifecycle and internal representation.
You: exit
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------

8. Conclusion

In this article, we explored how to use the simple openai library to integrate large language models into Java applications with minimal overhead. We walked through basic chat completions, real-time streaming responses, structured outputs using JSON schemas, and interactive conversations that preserve context across multiple user inputs.

By keeping provider configuration separate from application logic and relying on consistent request and response models, simple openai makes it easier to build maintainable, vendor-independent AI features. These patterns provide a solid foundation for developing everything from simple command-line tools to more advanced, conversational and data-driven systems powered by modern LLMs.

9. Download the Source Code

This article explored the use of the simple OpenAI library in Java applications.

Download
You can download the full source code of this example here: Java simple openai

Omozegie Aziegbe

Omos Aziegbe is a technical writer and web/application developer with a BSc in Computer Science and Software Engineering from the University of Bedfordshire. Specializing in Java enterprise applications with the Jakarta EE framework, Omos also works with HTML5, CSS, and JavaScript for web development. As a freelance web developer, Omos combines technical expertise with research and writing on topics such as software engineering, programming, web application development, computer science, and technology.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button