-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: correct audio start_ms handling #1901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Code Review for PR #1901: Fix/sarvam audio start_msSummaryThis PR fixes the calculation of ✅ Positive Aspects
🔍 Code Quality Observationsextension.py:349-360Good practices:
Minor suggestion: # Calculate start_ms: transcript corresponds to audio that was sent
# Assumption: Sarvam returns transcripts in-order for streaming audio
# Start is total audio sent minus the duration of this transcript
start_ms = max(0, total_audio_sent_ms - duration_ms)🤔 Potential Issues & Questions1. Edge Case: Reconnection TimingIn self.sent_user_audio_duration_ms_before_last_reset += (
self.audio_timeline.get_total_user_audio_duration()
)
self.audio_timeline.reset()Question: What happens if a transcript arrives during or immediately after reconnection? The timing calculation should still be correct, but it would be good to verify this behavior is tested. 2. Precision and Audio Duration Accuracyduration_ms = int(audio_duration * 1000) if audio_duration > 0 else 0Observation: Converting float seconds to integer milliseconds may lose sub-millisecond precision. This is likely acceptable for ASR timestamps, but if Sarvam returns very precise durations, consider using 3. Calculation Accuracy ConcernThe calculation
Question: Does Sarvam's API documentation guarantee that 4. Test CoverageLooking at "metrics": {
"audio_duration": 2.0,
"processing_latency": 0.1,
},Missing: The test doesn't validate that self.stop_test_if_checking_failed(
ten_env_tester,
data_dict["start_ms"] >= 0,
f"start_ms should be non-negative: {data_dict['start_ms']}"
)🔒 Security Considerations✅ No security concerns identified. The changes involve timing calculations only and don't introduce new attack surfaces. ⚡ Performance Considerations✅ No performance impact. The changes add minimal arithmetic operations that are negligible compared to audio processing and network I/O. 🧪 Test Coverage Recommendations
📝 Suggested Improvements1. Add logging for timing calculations (debugging aid):# Calculate start_ms: transcript corresponds to audio that was sent
# Start is total audio sent minus the duration of this transcript
start_ms = max(0, total_audio_sent_ms - duration_ms)
if self.config.dump: # Or use a debug log level
self.ten_env.log_debug(
f"Timing: total_sent={total_audio_sent_ms}ms, "
f"duration={duration_ms}ms, start={start_ms}ms",
category=LOG_CATEGORY_VENDOR,
)2. Consider defensive programming:# Validate that start_ms is reasonable
if start_ms > total_audio_sent_ms:
self.ten_env.log_warn(
f"Suspicious timing: start_ms ({start_ms}) > total_sent ({total_audio_sent_ms})",
category=LOG_CATEGORY_VENDOR,
)
start_ms = 0 # Fallback to safe value✨ Overall AssessmentRating: ✅ Approve with Minor Suggestions This is a solid fix that resolves the TODO and implements the missing functionality correctly. The implementation is clean, well-documented, and follows the existing code patterns in the extension. Recommendation:
Great work on:
Review performed by Claude Code - TEN Framework Code Review |
No description provided.