Spring AI Transcribing Audio Files Example
Speech-to-text technology has become essential in building transcription services, voice assistants, and accessibility tools. Let us delve into understanding how Spring AI transcribes audio files.
1. What is OpenAI?
OpenAI provides cutting-edge AI models, including Whisper for speech recognition. Whisper is an automatic speech recognition (ASR) system trained on a large dataset of multilingual and multitask supervised data. It can transcribe audio files into text with impressive accuracy.
2. Setting Up the Spring Boot Project
To build an application that transcribes audio files using OpenAI with Spring AI, the first step is to set up a Spring Boot project with the correct dependencies and configurations. This ensures that your application can communicate securely and efficiently with the OpenAI API while leveraging the Spring ecosystem. In this section, we’ll walk through initializing the project, adding dependencies, and setting up your configuration files.
2.1 Project Dependencies
Start by creating a new Spring Boot project using Spring Initializr. This tool allows you to easily generate a Maven or Gradle project with your selected modules. Include the following dependencies in your project:
- Spring Web – To build RESTful APIs and expose endpoints for audio upload and transcription.
- Spring Boot DevTools – Provides hot reload and runtime enhancements during development.
- Lombok – Reduces boilerplate code by providing annotations like
@Data,@Builder, and more.
Once the project is generated and opened in your IDE (such as IntelliJ or VS Code), add the following dependency to your pom.xml file to enable Spring AI and OpenAI integration:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>0.8.0-SNAPSHOT</version>
</dependency>
This starter provides pre-built configurations and components that make it easy to communicate with OpenAI’s API using Spring conventions.
2.2 Configure application.yml
Next, configure your application to securely access the OpenAI API. Open the src/main/resources/application.yml file and add the following configuration:
spring:
ai:
openai:
api-key: YOUR_OPENAI_API_KEY
base-url: https://api.openai.com/v1
Replace YOUR_OPENAI_API_KEY with your actual OpenAI key. You can obtain this key from your OpenAI developer dashboard. The base-url is the endpoint for OpenAI’s REST API. This URL is used to make all transcription and model inference requests.
3. Building the Audio Transcriber
This section focuses on creating the core functionality that handles the actual transcription of audio files using the OpenAI Whisper model. We’ll implement a REST controller to accept audio input, configure the necessary beans for OpenAI interaction, and enable multipart handling in the application.
3.1 Create Controller
The controller is responsible for exposing an endpoint that accepts audio files from the client. It uses the OpenAiAudioApi provided by the Spring AI framework to send the file to OpenAI’s transcription API and return the result to the user. The @PostMapping annotation defines an endpoint /api/audio/transcribe that accepts audio data in multipart form. The controller creates a TranscriptionRequest, specifying the model as whisper-1 and the response format as plain text.
package com.example.transcriber.controller;
import lombok.RequiredArgsConstructor;
import org.springframework.ai.openai.api.OpenAiAudioApi;
import org.springframework.ai.openai.api.OpenAiAudioApi.TranscriptionRequest;
import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;
import java.io.IOException;
@RestController
@RequiredArgsConstructor
@RequestMapping("/api/audio")
public class AudioTranscriptionController {
private final OpenAiAudioApi openAiAudioApi;
@PostMapping(value = "/transcribe", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
public String transcribeAudio(@RequestParam("file") MultipartFile file) throws IOException {
TranscriptionRequest request = TranscriptionRequest.builder()
.file(file.getResource())
.model("whisper-1")
.responseFormat("text")
.build();
return openAiAudioApi.transcription(request);
}
}
This code ensures that whenever a user uploads an audio file via POST request, it gets sent to OpenAI for transcription and the resulting text is returned in the response.
3.2 Enable OpenAiAudioApi as a Bean
To allow Spring to inject and manage the OpenAiAudioApi, we need to configure it as a bean. This configuration class creates a bean using the default OpenAI configuration, which includes your API key and base URL.
package com.example.transcriber.config;
import org.springframework.ai.openai.api.OpenAiAudioApi;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.client.RestClient;
@Configuration
public class OpenAIConfig {
@Bean
public OpenAiAudioApi openAiAudioApi(OpenAiAudioApi.OpenAiAudioApiConfig config) {
return new OpenAiAudioApi(config, RestClient.builder());
}
}
The OpenAiAudioApi is constructed using its configuration and a RestClient. Spring will automatically pick up this bean and inject it wherever needed, such as in the controller class defined earlier.
3.3 Enable Multipart Support
Spring Boot supports file uploads out of the box, but you can customize the maximum file size allowed for upload by adding the following configuration to your `application.yml`. This step is especially useful if users will be uploading large audio files.
spring:
servlet:
multipart:
max-file-size: 10MB
max-request-size: 10MB
This configuration increases the upload limit to 10MB, both for individual files and total request size. You can adjust these values based on your expected file size requirements.
4. Run and Test the Audio Transcriber
4.1 Start the Spring Boot App
Make sure all configurations are correctly set up in your application.yml, and the OpenAI API key is available either directly or via environment variable. Then, navigate to the root directory of your Spring Boot project and run the following command to start the application:
./mvnw spring-boot:run
This will compile the project, start the embedded Tomcat server, and deploy your application to http://localhost:8080 by default. Watch the console logs to confirm that the application has started successfully and no errors have occurred during the bean initialization or context setup.
4.2 Test with Postman or curl
Once the application is running, you can test the transcription API using either Postman or curl. The endpoint /api/audio/transcribe accepts a POST request with multipart/form-data content that includes the audio file. Here is how to test using curl from the terminal:
curl -X POST http://localhost:8080/api/audio/transcribe \ -H "Content-Type: multipart/form-data" \ -F "file=@/path/to/audio.mp3"
4.3 Sample Output
If everything is working correctly, OpenAI will return the transcribed text in plain text format. A successful output will look like the following:
Hello, this is a sample audio file being transcribed by OpenAI Whisper model using Spring AI.
This confirms that the audio file was received, processed by OpenAI, and the response was successfully returned by your Spring Boot application. You can now extend the application to save, process, or analyze this transcription further based on your business needs.
5. Conclusion
By combining the power of Spring Boot and OpenAI’s Whisper model, you can quickly build a robust speech-to-text transcription API. Spring AI simplifies the integration with OpenAI APIs and allows developers to focus more on business logic than plumbing code.




