Improve SpeechSession lifecycle and observability by rudrankriyam · Pull Request #19 · rryam/AuralKit

rudrankriyam · 2025-10-25T00:31:57Z

Summary

Ensure AVAudioSession is activated/deactivated explicitly so other apps regain audio focus as soon as transcription ends.
Add AVAudioSession interruption observers that pause, optionally resume, or tear down the pipeline to avoid stuck sessions during calls or Siri.
Convert status and audio-input streams into broadcast-style observables (plus smaller extensions) so multiple UI and logic consumers receive every update consistently.
Restart the demo app's VAD indicator task whenever voice activation settings change to keep its speech-detection badge accurate.
Document and structural tweaks (Agents guide, SpeechSession file splits, TODO tracker) that keep the codebase maintainable while capturing outstanding work.

gemini-code-assist · 2025-10-25T00:32:20Z

Summary of Changes

Hello @rudrankriyam, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and observability of the SpeechSession lifecycle. It introduces explicit control over AVAudioSession activation and deactivation, along with comprehensive handling for audio session interruptions, preventing stuck sessions. Furthermore, key status and audio input streams have been refactored into broadcast observables, allowing for more flexible and consistent consumption of updates across the application. These changes, coupled with structural improvements and updated documentation, contribute to a more stable and maintainable speech transcription framework.

Highlights

Audio Session Management: Explicitly activate and deactivate AVAudioSession to improve audio focus handling for other applications, ensuring resources are released promptly.
Interruption Handling: Implemented observers for AVAudioSession interruptions, allowing the speech pipeline to pause, optionally resume, or tear down gracefully during system events like phone calls or Siri interactions, preventing stuck sessions.
Broadcast Observables: Converted SpeechSession's status and audio-input streams into broadcast-style AsyncStreams, enabling multiple UI and logic consumers to receive consistent updates without needing to manage individual continuations.
Demo App VAD Indicator Reliability: Ensured the demo application's Voice Activity Detection (VAD) indicator task restarts when VAD settings change, maintaining the accuracy of its speech-detection badge.
Structural Refinements: SpeechSession was refactored from an actor to a @MainActor class, and its related types (Status, StreamingMode, VoiceActivationConfiguration) and observable properties (statusStream, audioInputConfigurationStream, modelDownloadProgress, isVoiceActivationEnabled) were moved to dedicated extensions for better organization and clarity.
Documentation & TODOs: Updated Agents.md and README.md for clarity and added a TODO.md to track completed work items, improving codebase maintainability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is an excellent pull request that significantly improves the lifecycle management and observability of SpeechSession. The explicit activation/deactivation of AVAudioSession and the addition of robust interruption handling are crucial for a well-behaved audio application. The transition to broadcast-style async streams for status and audio input updates is a great architectural improvement, allowing multiple consumers to observe the session's state concurrently. The associated refactoring, including splitting the SpeechSession implementation into logical extensions and files, enhances code organization and maintainability. The inclusion of new tests for the broadcast stream behavior and updates to documentation and the demo app make this a comprehensive and high-quality contribution.

gemini-code-assist · 2025-10-25T00:33:44Z

Aural/TranscriptionView.swift

+                guard enableVAD else {
+                    await MainActor.run {
+                        isSpeechDetected = true
+                    }
+                    return
+                }
+
+                guard let stream = session.speechDetectorResultsStream else {
+                    await MainActor.run {
+                        isSpeechDetected = true
+                    }
+                    return
+                }


The two guard statements here have identical else blocks. They can be combined into a single guard statement to reduce code duplication and improve conciseness.

Suggested change

guard enableVAD else {

await MainActor.run {

isSpeechDetected = true

}

return

}

guard let stream = session.speechDetectorResultsStream else {

await MainActor.run {

isSpeechDetected = true

}

return

}

guard enableVAD, let stream = session.speechDetectorResultsStream else {

await MainActor.run {

isSpeechDetected = true

}

return

}

gemini-code-assist · 2025-10-25T00:33:44Z

Tests/AuralKitTests/AuralKitTests.swift

+        #expect(firstA != nil)
+        #expect(firstB != nil)
+        #expect(firstA! == nil)
+        #expect(firstB! == nil)


These assertions are functionally correct, but they can be expressed more idiomatically and readably. The type of firstA is AudioInputInfo?? (or Optional<Optional<AudioInputInfo>>), which can be confusing. Using #expect(value == .some(nil)) makes the intent clearer by explicitly checking for a non-nil outer optional containing a nil value.

Suggested change

#expect(firstA != nil)

#expect(firstB != nil)

#expect(firstA! == nil)

#expect(firstB! == nil)

#expect(firstA == .some(nil))

#expect(firstB == .some(nil))

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-10-25T00:35:50Z

Sources/AuralKit/SpeechSession+Audio.swift

+    private func handleInterruptionEnded(options: AVAudioSession.InterruptionOptions) async {
+        guard shouldResumeAfterInterruption else { return }
+        shouldResumeAfterInterruption = false
+
+        guard options.contains(.shouldResume) else {
+            if Self.shouldLog(.notice) {
+                Self.logger.notice("Audio session interruption ended without resume option; cleaning up session")
+            }
+            prepareForStop()
+            await cleanup(cancelRecognizer: true)
+            await finishStream(error: nil)
+            return
+        }
+
+        do {
+            try await setupAudioSession()
+            try startAudioStreaming()
+            setStatus(.transcribing)
+            if Self.shouldLog(.notice) {
+                Self.logger.notice("Audio session resumed after interruption")


Resuming after phone interruptions ignores manual stop

The new interruption handler resumes audio streaming whenever shouldResumeAfterInterruption is still true, without checking whether the session was explicitly stopped while the interruption was in progress. shouldResumeAfterInterruption is only reset inside handleInterruptionEnded, so if the user taps “Stop” during a call interruption the flag stays true. When the interruption ends, the code at handleInterruptionEnded reactivates the AVAudioSession, restarts the audio engine, and sets the status back to .transcribing even though the recognizer and stream were already torn down by stopTranscribing(). This leaves the session in an inconsistent state and restarts microphone capture unexpectedly. Consider clearing shouldResumeAfterInterruption during stopTranscribing()/cleanup or guarding the resume path with the current status/task state.

Useful? React with 👍 / 👎.

…on public

rudrankriyam · 2025-10-25T01:12:20Z

/gemini review

gemini-code-assist

Code Review

This pull request significantly improves the SpeechSession lifecycle management and observability. The introduction of explicit AVAudioSession activation/deactivation, handling of audio interruptions, and conversion of streams to broadcast-style observables are excellent enhancements. The code is well-structured, with good separation of concerns into different file extensions, and the inclusion of new tests for the broadcast streams is appreciated. My main feedback concerns a small improvement to the new audioInputConfigurationStream to ensure it provides an initial value to new subscribers, making its behavior more predictable and consistent with the statusStream.

gemini-code-assist · 2025-10-25T01:14:30Z

Sources/AuralKit/SpeechSession.swift

 #if os(iOS) || os(macOS)
-    var audioInputConfigurationContinuation: AsyncStream<AudioInputInfo?>.Continuation?
+    var audioInputContinuations: [UUID: AsyncStream<AudioInputInfo?>.Continuation] = [:]
 #endif


To ensure new subscribers to audioInputConfigurationStream receive the current state, it's helpful to cache the last known AudioInputInfo. Please add a property to store this. This is part of a set of changes to provide an initial value to stream consumers.

Suggested change

#if os(iOS) || os(macOS)

var audioInputConfigurationContinuation: AsyncStream<AudioInputInfo?>.Continuation?

var audioInputContinuations: [UUID: AsyncStream<AudioInputInfo?>.Continuation] = [:]

#endif

var audioInputContinuations: [UUID: AsyncStream<AudioInputInfo?>.Continuation] = [:]

var lastAudioInputInfo: AudioInputInfo?

gemini-code-assist · 2025-10-25T01:14:30Z

Sources/AuralKit/SpeechSession+Observables.swift

+    public var audioInputConfigurationStream: AsyncStream<AudioInputInfo?> {
+        AsyncStream { [weak self] continuation in
+            let id = UUID()
+            Task { @MainActor [weak self] in
+                guard let self else { return }
+                self.audioInputContinuations[id] = continuation
+            }


The audioInputConfigurationStream currently doesn't provide an initial value to new subscribers, which is inconsistent with statusStream and can lead to surprising behavior for consumers. By yielding the newly cached lastAudioInputInfo, you can ensure the stream behaves more like a CurrentValueSubject from Combine, providing the most recent value upon subscription.

public var audioInputConfigurationStream: AsyncStream<AudioInputInfo?> { AsyncStream { [weak self] continuation in let id = UUID() Task { @MainActor [weak self] in guard let self else { return } continuation.yield(self.lastAudioInputInfo) self.audioInputContinuations[id] = continuation }

… handling

rudrankriyam added 6 commits October 25, 2025 05:38

Add TODO list and deactivate audio session on cleanup

6630a4f

Handle AVAudioSession interruptions

377e5d1

Broadcast status and audio input streams

af13e7c

Restart VAD indicator task when toggling voice activation

509f2f1

Add tests for broadcast streams

2eb016a

Refactor Agents.md and SpeechSession to improve clarity and structure

839ed52

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 25, 2025

View reviewed changes

rudrankriyam added 3 commits October 25, 2025 06:14

Refactor audio session interruption handling and make setupAudioSessi…

ef10883

…on public

Update TranscriptionView logic and improve AudioInputInfo

c0879c9

Add timeout handling and awaitResult helper for async tests

1012440

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

Introduce Task.yield() in audio input stream tests for improved async…

9bdcfb3

… handling

rudrankriyam merged commit c3dad58 into main Oct 25, 2025
1 check passed

rudrankriyam deleted the feature/todo-audio-session branch October 25, 2025 01:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Improve SpeechSession lifecycle and observability#19

Improve SpeechSession lifecycle and observability#19
rudrankriyam merged 10 commits intomainfrom
feature/todo-audio-session

rudrankriyam commented Oct 25, 2025

Uh oh!

gemini-code-assist bot commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 25, 2025

Uh oh!

rudrankriyam commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

rudrankriyam commented Oct 25, 2025

Summary

Uh oh!

gemini-code-assist bot commented Oct 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

rudrankriyam commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant