Improve SpeechSession lifecycle and observability#19
Conversation
Summary of ChangesHello @rudrankriyam, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and observability of the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is an excellent pull request that significantly improves the lifecycle management and observability of SpeechSession. The explicit activation/deactivation of AVAudioSession and the addition of robust interruption handling are crucial for a well-behaved audio application. The transition to broadcast-style async streams for status and audio input updates is a great architectural improvement, allowing multiple consumers to observe the session's state concurrently. The associated refactoring, including splitting the SpeechSession implementation into logical extensions and files, enhances code organization and maintainability. The inclusion of new tests for the broadcast stream behavior and updates to documentation and the demo app make this a comprehensive and high-quality contribution.
Aural/TranscriptionView.swift
Outdated
| guard enableVAD else { | ||
| await MainActor.run { | ||
| isSpeechDetected = true | ||
| } | ||
| return | ||
| } | ||
|
|
||
| guard let stream = session.speechDetectorResultsStream else { | ||
| await MainActor.run { | ||
| isSpeechDetected = true | ||
| } | ||
| return | ||
| } |
There was a problem hiding this comment.
The two guard statements here have identical else blocks. They can be combined into a single guard statement to reduce code duplication and improve conciseness.
| guard enableVAD else { | |
| await MainActor.run { | |
| isSpeechDetected = true | |
| } | |
| return | |
| } | |
| guard let stream = session.speechDetectorResultsStream else { | |
| await MainActor.run { | |
| isSpeechDetected = true | |
| } | |
| return | |
| } | |
| guard enableVAD, let stream = session.speechDetectorResultsStream else { | |
| await MainActor.run { | |
| isSpeechDetected = true | |
| } | |
| return | |
| } |
| #expect(firstA != nil) | ||
| #expect(firstB != nil) | ||
| #expect(firstA! == nil) | ||
| #expect(firstB! == nil) |
There was a problem hiding this comment.
These assertions are functionally correct, but they can be expressed more idiomatically and readably. The type of firstA is AudioInputInfo?? (or Optional<Optional<AudioInputInfo>>), which can be confusing. Using #expect(value == .some(nil)) makes the intent clearer by explicitly checking for a non-nil outer optional containing a nil value.
| #expect(firstA != nil) | |
| #expect(firstB != nil) | |
| #expect(firstA! == nil) | |
| #expect(firstB! == nil) | |
| #expect(firstA == .some(nil)) | |
| #expect(firstB == .some(nil)) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| private func handleInterruptionEnded(options: AVAudioSession.InterruptionOptions) async { | ||
| guard shouldResumeAfterInterruption else { return } | ||
| shouldResumeAfterInterruption = false | ||
|
|
||
| guard options.contains(.shouldResume) else { | ||
| if Self.shouldLog(.notice) { | ||
| Self.logger.notice("Audio session interruption ended without resume option; cleaning up session") | ||
| } | ||
| prepareForStop() | ||
| await cleanup(cancelRecognizer: true) | ||
| await finishStream(error: nil) | ||
| return | ||
| } | ||
|
|
||
| do { | ||
| try await setupAudioSession() | ||
| try startAudioStreaming() | ||
| setStatus(.transcribing) | ||
| if Self.shouldLog(.notice) { | ||
| Self.logger.notice("Audio session resumed after interruption") |
There was a problem hiding this comment.
Resuming after phone interruptions ignores manual stop
The new interruption handler resumes audio streaming whenever shouldResumeAfterInterruption is still true, without checking whether the session was explicitly stopped while the interruption was in progress. shouldResumeAfterInterruption is only reset inside handleInterruptionEnded, so if the user taps “Stop” during a call interruption the flag stays true. When the interruption ends, the code at handleInterruptionEnded reactivates the AVAudioSession, restarts the audio engine, and sets the status back to .transcribing even though the recognizer and stream were already torn down by stopTranscribing(). This leaves the session in an inconsistent state and restarts microphone capture unexpectedly. Consider clearing shouldResumeAfterInterruption during stopTranscribing()/cleanup or guarding the resume path with the current status/task state.
Useful? React with 👍 / 👎.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request significantly improves the SpeechSession lifecycle management and observability. The introduction of explicit AVAudioSession activation/deactivation, handling of audio interruptions, and conversion of streams to broadcast-style observables are excellent enhancements. The code is well-structured, with good separation of concerns into different file extensions, and the inclusion of new tests for the broadcast streams is appreciated. My main feedback concerns a small improvement to the new audioInputConfigurationStream to ensure it provides an initial value to new subscribers, making its behavior more predictable and consistent with the statusStream.
| #if os(iOS) || os(macOS) | ||
| var audioInputConfigurationContinuation: AsyncStream<AudioInputInfo?>.Continuation? | ||
| var audioInputContinuations: [UUID: AsyncStream<AudioInputInfo?>.Continuation] = [:] | ||
| #endif |
There was a problem hiding this comment.
To ensure new subscribers to audioInputConfigurationStream receive the current state, it's helpful to cache the last known AudioInputInfo. Please add a property to store this. This is part of a set of changes to provide an initial value to stream consumers.
| #if os(iOS) || os(macOS) | |
| var audioInputConfigurationContinuation: AsyncStream<AudioInputInfo?>.Continuation? | |
| var audioInputContinuations: [UUID: AsyncStream<AudioInputInfo?>.Continuation] = [:] | |
| #endif | |
| var audioInputContinuations: [UUID: AsyncStream<AudioInputInfo?>.Continuation] = [:] | |
| var lastAudioInputInfo: AudioInputInfo? | |
| public var audioInputConfigurationStream: AsyncStream<AudioInputInfo?> { | ||
| AsyncStream { [weak self] continuation in | ||
| let id = UUID() | ||
| Task { @MainActor [weak self] in | ||
| guard let self else { return } | ||
| self.audioInputContinuations[id] = continuation | ||
| } |
There was a problem hiding this comment.
The audioInputConfigurationStream currently doesn't provide an initial value to new subscribers, which is inconsistent with statusStream and can lead to surprising behavior for consumers. By yielding the newly cached lastAudioInputInfo, you can ensure the stream behaves more like a CurrentValueSubject from Combine, providing the most recent value upon subscription.
public var audioInputConfigurationStream: AsyncStream<AudioInputInfo?> {
AsyncStream { [weak self] continuation in
let id = UUID()
Task { @MainActor [weak self] in
guard let self else { return }
continuation.yield(self.lastAudioInputInfo)
self.audioInputContinuations[id] = continuation
}
Summary