-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: improve avatar voice_end timing and deepgram reliability #1882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Avatar extensions (anam, generic, heygen): - Replace 500ms debounce timer with tts_audio_end event handling - Add send_voice_end() method triggered by tts_audio_end (reason=1) - Remove dead code from heygen (unused queues, is_speaking timeout) - Fix graph routing: tts_audio_end now routes to avatar (was only thymia) Deepgram WebSocket ASR: - Add silence sender for EOT detection when mic muted - Add auto-reconnect with exponential backoff - Add keep-alive silence to prevent WebSocket timeout Playground: - Add URL param support: ?graph=name pre-selects graph on load
Code Review: PR #1882 - Improve avatar voice_end timing and deepgram reliabilitySummaryThis PR improves the voice assistant system by replacing debounce timers with event-driven ✅ Strengths1. Event-Driven Architecture
2. Code Cleanup
3. Reliability Improvements
|
Pull Request ReviewSummaryThis PR makes meaningful improvements to avatar timing and ASR reliability. The changes are well-structured and address real issues with debounce-based timing and WebSocket stability. Overall, this is a solid enhancement with a few areas for consideration. Code Quality & Best PracticesStrengths✅ Event-driven design: Replacing the 500ms debounce timer with ✅ Dead code removal: Removing unused queues and state tracking from heygen extension improves maintainability (extension.py removed ✅ Consistent patterns: The ✅ Clear documentation: Inline comments explain the rationale for changes (e.g., "NOTE: voice_end is now triggered by tts_audio_end event") Areas for Improvement1. Exponential Backoff Implementation (deepgram_ws_asr_python/extension.py:295-332) # Current implementation in _reconnect_monitor
current_delay = self.reconnect_delay # Reset on success
# Exponential backoff
current_delay = min(current_delay * 2, self.max_reconnect_delay)
Recommendation: Store # In __init__
self.current_reconnect_delay = self.reconnect_delay
# In _reconnect_monitor
if self.is_connected():
self.current_reconnect_delay = self.reconnect_delay # Reset
else:
await asyncio.sleep(self.current_reconnect_delay)
self.current_reconnect_delay = min(
self.current_reconnect_delay * 2,
self.max_reconnect_delay
)2. Silent Continue After Max Silence Duration (extension.py:287-293) if silence_elapsed > self.silence_max_duration:
self.ten_env.log_info(...)
silence_start_time = 0.0
continue # ⚠️ Goes to top of while loop, sleeps, checks againThe Recommendation: Add a state flag or ensure the gap threshold check prevents re-entry. 3. Error Handling in send_voice_end() (anam.py:686) else:
self.ten_env.log_error("Failed to send voice_end message")
return successNo exception handling around the Recommendation: Add try-catch block: try:
success = await self._send_message(end_message)
except Exception as e:
self.ten_env.log_error(f"Exception sending voice_end: {e}")
return FalsePotential BugsCritical🔴 Race condition in silence sender (extension.py:242-309) The While Python's GIL provides some protection for simple assignments, the compound condition checks could lead to race conditions: # In _silence_sender (line 269-274)
if (self.last_audio_frame_time > 0
and (now - self.last_audio_frame_time) > self.silence_gap_threshold
and (self.last_interim_text or self.accumulated_segments)):Impact: Medium - Unlikely to cause crashes but could cause timing inconsistencies Recommendation: Use Medium🟡 Missing cleanup check in stop_connection (extension.py:755-774) The task cancellation uses Recommendation: Add timeout to task cancellation: if self.silence_sender_task:
self.silence_sender_task.cancel()
try:
await asyncio.wait_for(self.silence_sender_task, timeout=2.0)
except (asyncio.CancelledError, asyncio.TimeoutError):
passPerformance ConsiderationsPositive Changes✅ Eliminates unnecessary timeouts: Removing the 500ms debounce delay reduces latency in avatar responses ✅ Removes dead queues: The heygen cleanup removes unused Concernswhile True:
await asyncio.sleep(0.01) # 10ms intervalsThe silence sender runs at 100Hz continuously. While 10ms sleeps are lightweight, this adds constant background activity. Impact: Low - Modern async runtimes handle this well, but consider increasing to 50ms (20Hz) if 10ms granularity isn't strictly required for audio gap detection. The useEffect that checks URL params doesn't have proper dependencies and runs on every Recommendation: Add Security ConcernsLow Risk🟢 URL parameter injection (authInitializer/index.tsx:60) const graphParam = urlParams.get("graph");
if (graphParam) {
const graph = graphList.find((g) => g.name === graphParam);The code safely validates that ✅ No direct security issues identified. The changes don't introduce authentication bypass, XSS, or injection vulnerabilities. Test CoverageMissing Test ScenariosThe PR description doesn't mention test coverage. These areas would benefit from tests:
Recommendation: Add integration tests to Specific File Commentsproperty.json & rebuild_property.py✅ Correct routing fix: Adding avatar to heygen_avatar_python/extension.py✅ Excellent cleanup: Removing 74 lines of unused code ( deepgram_ws_asr_python/extension.py
playground authInitializer✅ Minor feature: URL param support is a nice UX improvement for deep linking. Implementation is straightforward. Recommendations PriorityHigh Priority
Medium Priority
Low Priority
Overall AssessmentRating: Approve with Minor Changes This PR delivers meaningful improvements:
The issues identified are mostly minor and can be addressed in follow-up PRs, though fixing the exponential backoff would be ideal before merge. Estimated Risk: Low - Changes are well-isolated to specific extensions and don't affect core framework. |
Pull Request ReviewThis PR improves avatar voice timing synchronization and Deepgram WebSocket reliability. Overall, the changes are well-structured and address real-world production issues. Below are my findings: ✅ Positive Aspects1. Improved Avatar Timing Architecture
2. Deepgram Reliability Improvements
3. Code Quality
🔍 Potential Issues1. Race Condition in Avatar Extensions (Medium Priority)Location: if self.recorder and self.recorder.ws_connected():
await self.recorder.send_voice_end()
else:
ten_env.log_warn(
"[ANAM_TTS_END] Recorder not ready, cannot send voice_end"
)Issue: There's a TOCTOU (time-of-check-time-of-use) race condition. The recorder could disconnect between the check and the send. Recommendation: Wrap the send in a try-except block: if self.recorder:
try:
await self.recorder.send_voice_end()
except Exception as e:
ten_env.log_warn(f"Failed to send voice_end: {e}")2. Memory Leak Risk in Deepgram (Low Priority)Location: Issue: The Recommendation: Use a control flag: self._silence_sender_running = True
async def _silence_sender(self):
while self._silence_sender_running:
# ... existing code3. Potential Deadlock in HeyGen (Low Priority)Location: Issue: The removed Recommendation: Verify that the queue-based approach provides sufficient thread safety, or add a comment explaining why synchronization is unnecessary. 4. Hardcoded Silence Frame Size (Low Priority)Location: # 10ms of silence at 16kHz mono (16-bit = 2 bytes per sample)
# 16000 samples/sec * 0.01 sec * 2 bytes = 320 bytes
silence_frame = bytes(320)Issue: This is hardcoded for 16kHz but Recommendation: Calculate dynamically: silence_duration_ms = 10
bytes_per_sample = 2
samples = (self.config.sample_rate * silence_duration_ms) // 1000
silence_frame = bytes(samples * bytes_per_sample)🔒 Security ConcernsNo major security issues identified. The changes don't introduce new attack surfaces. However:
⚡ Performance Considerations1. Silence Sender Overhead (Minor)Location: The silence sender wakes up every 10ms. While this is acceptable, consider:
2. Queue Growth in Avatar ExtensionsLocation: The Recommendation: Set a maxsize: self.input_audio_queue = asyncio.Queue(maxsize=100)3. Async Task CreationLocation: Multiple files Tasks are created with 🧪 Test CoverageConcern: This PR doesn't include tests for the new functionality:
📋 Code Style & Best PracticesFollows TEN Framework Conventions ✅
Minor Suggestions:
🎯 Graph Configuration ChangesLocation: The routing changes to send 🎨 Playground URL Parameter FeatureLocation: Review:
Minor suggestion: Consider URL decoding: const graphParam = decodeURIComponent(urlParams.get("graph") || "");📊 Summary
✅ RecommendationAPPROVE with suggestions The changes solve real production issues and improve system reliability. The identified issues are minor and can be addressed in follow-up PRs or before merge. The architecture improvements (event-driven voice_end, auto-reconnect) are solid enhancements. Before Merge:
Post-Merge:
Great work on improving the reliability of the voice assistant! 🎉 |
Avatar extensions (anam, generic, heygen):
Deepgram WebSocket ASR:
Playground: