Skip to content

feat(voice): add talk mode latency tracing#32974

Closed
ngutman wants to merge 2 commits intomainfrom
feat/talk-mode-latency-tracing
Closed

feat(voice): add talk mode latency tracing#32974
ngutman wants to merge 2 commits intomainfrom
feat/talk-mode-latency-tracing

Conversation

@ngutman
Copy link
Copy Markdown
Contributor

@ngutman ngutman commented Mar 3, 2026

Summary

  • add per-turn latency tracing in talk mode from final transcript through chat send, stream, and speech output stages
  • record first-request/first-chunk/playback timing anchors for ElevenLabs streaming paths
  • add an onStart callback to TalkSystemSpeechSynthesizer.speak(...) via AVSpeechSynthesizerDelegate.didStart
  • add incremental speech buffer tests for soft-boundary splitting and short-chunk buffering behavior

Testing

  • swift test --package-path apps/ios --filter TalkModeIncrementalSpeechBufferTests (fails in this repo layout: Could not find Package.swift for apps/ios)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb90e004a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR adds per-turn latency tracing to talk mode, threading timing anchors through the full pipeline from final transcript to speech output (ElevenLabs streaming and system TTS), and also introduces soft-boundary splitting in IncrementalSpeechBuffer to start speaking sooner on long stream chunks.

Key changes:

  • TalkTurnLatencyTrace struct captures timestamps for each stage (transcript → process start → chat send/ack → assistant stream → first TTS request/chunk → playback start → system speech start).
  • instrumentTTSStreamFirstChunk wraps AsyncThrowingStream<Data, Error> to record the first-chunk timing anchor before forwarding data downstream.
  • isSoftBoundary triggers a segment split at any whitespace once the buffer reaches 72 characters, enabling earlier speech start on long chunks.
  • TalkSystemSpeechSynthesizer.speak(text:language:onStart:) gains an onStart callback wired to AVSpeechSynthesizerDelegate.didStart.

Issues found:

  • The didStart delegate in TalkSystemSpeechSynthesizer does not validate that the firing utterance is the currently-active one. When speech is interrupted and restarted rapidly, a queued Task { @MainActor in } for the old utterance's didStart event can consume the new onStart callback before the new speech actually starts, causing firstSystemSpeechStartAt to be recorded prematurely or not at all.

Confidence Score: 2/5

  • Safe to merge for user-facing behavior, but latency data from the system TTS path may be inaccurate due to the didStart callback race condition.
  • One real functional correctness issue exists: the didStart handler in TalkSystemSpeechSynthesizer lacks an utterance-identity guard, which can cause the onStart callback to fire for the wrong speech instance when speech is interrupted and restarted quickly. This does not affect playback, only latency measurement accuracy for the system TTS fallback path. The ElevenLabs and incremental-speech tracing paths are correct.
  • apps/shared/OpenClawKit/Sources/OpenClawKit/TalkSystemSpeechSynthesizer.swift — the new speechSynthesizer(_:didStart:) delegate method needs an utterance identity check before firing the callback.

Last reviewed commit: eb90e00

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@openclaw-barnacle openclaw-barnacle bot added app: ios App: ios size: M maintainer Maintainer-authored PR labels Mar 3, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d3f1fcaf4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

private func ensureLatencyTrace(transcriptChars: Int) {
guard self.activeLatencyTrace == nil else { return }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset latency traces when starting a new transcript turn

ensureLatencyTrace bails out if activeLatencyTrace is already set, but this commit introduced paths where a trace can be started and never finished (for example push-to-talk offline/cancel exits that do not call finishLatencyTrace). In that case, the next turn can reuse stale timestamps/anchors from a previous turn, which merges separate interactions into one trace and corrupts per-turn latency metrics.

Useful? React with 👍 / 👎.

} else {
await self.playAssistant(text: assistantText)
}
self.finishLatencyTrace(outcome: "spoken", fields: ["runId=\(runId)"])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Record playback failures as non-spoken latency outcomes

This unconditionally finalizes tracing as outcome=spoken, but playAssistant handles TTS/system-speech errors internally and can end with statusText = "Speak failed: ..." without any spoken output. When that happens, latency telemetry marks failed speech turns as successful, which skews success-rate and latency analysis for talk mode.

Useful? React with 👍 / 👎.

@mbelinky
Copy link
Copy Markdown
Contributor

mbelinky commented Mar 3, 2026

Thanks for the solid work here, @ngutman.

Closing this original PR in favor of smaller, reviewable slices that preserve the reliability/UX improvements:

This keeps risk isolated and makes it easier to validate each behavior independently.

@mbelinky mbelinky closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app: ios App: ios maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants