fix: correct audio start_ms handling #1901

nitin4real · 2025-12-19T09:30:39Z

No description provided.

claude · 2025-12-19T09:33:23Z

Code Review for PR #1901: Fix/sarvam audio start_ms

Summary

This PR fixes the calculation of start_ms for ASR transcripts in the Sarvam ASR extension by implementing proper timing based on total audio sent and transcript duration.

✅ Positive Aspects

Solves a Real Problem: The PR addresses the TODO comment and implements the missing start_ms calculation logic
Correct Implementation: The calculation start_ms = max(0, total_audio_sent_ms - duration_ms) is logically sound
Good Documentation: The inline comments clearly explain the calculation logic
Version Bump: Properly incremented the version from 0.1.0 to 0.1.1 in manifest.json
Uses Existing Infrastructure: Leverages the audio_timeline tracking that's already in place

🔍 Code Quality Observations

extension.py:349-360

Good practices:

Clear variable naming (total_audio_sent_ms, duration_ms)
Use of max(0, ...) to prevent negative values
Multi-line expression formatting improves readability

Minor suggestion:
The calculation assumes that the transcript duration directly corresponds to the most recently sent audio. This works for streaming ASR where transcripts are returned in order, but consider documenting this assumption:

# Calculate start_ms: transcript corresponds to audio that was sent
# Assumption: Sarvam returns transcripts in-order for streaming audio
# Start is total audio sent minus the duration of this transcript
start_ms = max(0, total_audio_sent_ms - duration_ms)

🤔 Potential Issues & Questions

1. Edge Case: Reconnection Timing

In extension.py:194-197, when reconnecting:

self.sent_user_audio_duration_ms_before_last_reset += (
    self.audio_timeline.get_total_user_audio_duration()
)
self.audio_timeline.reset()

Question: What happens if a transcript arrives during or immediately after reconnection? The timing calculation should still be correct, but it would be good to verify this behavior is tested.

2. Precision and Audio Duration Accuracy

duration_ms = int(audio_duration * 1000) if audio_duration > 0 else 0

Observation: Converting float seconds to integer milliseconds may lose sub-millisecond precision. This is likely acceptable for ASR timestamps, but if Sarvam returns very precise durations, consider using round() instead of int() to minimize accumulated error.

3. Calculation Accuracy Concern

The calculation total_audio_sent_ms - duration_ms assumes that:

The transcript's audio_duration matches exactly the amount of audio consumed
There's no significant latency or buffering in the Sarvam service

Question: Does Sarvam's API documentation guarantee that metrics.audio_duration represents the exact duration of audio that was transcribed? If there's VAD (Voice Activity Detection) or silence trimming, the duration might not match the sent audio perfectly.

4. Test Coverage

Looking at tests/test_asr_result.py:147-150:

"metrics": {
    "audio_duration": 2.0,
    "processing_latency": 0.1,
},

Missing: The test doesn't validate that start_ms is calculated correctly. Consider adding assertions like:

self.stop_test_if_checking_failed(
    ten_env_tester,
    data_dict["start_ms"] >= 0,
    f"start_ms should be non-negative: {data_dict['start_ms']}"
)

🔒 Security Considerations

✅ No security concerns identified. The changes involve timing calculations only and don't introduce new attack surfaces.

⚡ Performance Considerations

✅ No performance impact. The changes add minimal arithmetic operations that are negligible compared to audio processing and network I/O.

🧪 Test Coverage Recommendations

Add explicit start_ms validation in test_asr_result.py
Test reconnection scenarios to ensure timing remains accurate after reconnection
Test multiple consecutive transcripts to verify cumulative timing accuracy
Test edge case where audio_duration is 0 or missing

📝 Suggested Improvements

1. Add logging for timing calculations (debugging aid):

# Calculate start_ms: transcript corresponds to audio that was sent
# Start is total audio sent minus the duration of this transcript
start_ms = max(0, total_audio_sent_ms - duration_ms)

if self.config.dump:  # Or use a debug log level
    self.ten_env.log_debug(
        f"Timing: total_sent={total_audio_sent_ms}ms, "
        f"duration={duration_ms}ms, start={start_ms}ms",
        category=LOG_CATEGORY_VENDOR,
    )

2. Consider defensive programming:

# Validate that start_ms is reasonable
if start_ms > total_audio_sent_ms:
    self.ten_env.log_warn(
        f"Suspicious timing: start_ms ({start_ms}) > total_sent ({total_audio_sent_ms})",
        category=LOG_CATEGORY_VENDOR,
    )
    start_ms = 0  # Fallback to safe value

✨ Overall Assessment

Rating: ✅ Approve with Minor Suggestions

This is a solid fix that resolves the TODO and implements the missing functionality correctly. The implementation is clean, well-documented, and follows the existing code patterns in the extension.

Recommendation:

✅ Safe to merge after considering the test coverage suggestions
The minor suggestions above are optional improvements for robustness, not blockers

Great work on:

Clear, understandable code
Proper version management
Good inline documentation

Review performed by Claude Code - TEN Framework Code Review

nitin4real added 2 commits December 19, 2025 11:54

fix: calculate start time for ASR transcript

35ee647

fix: version update for sarvam asr

70e243d

nitin4real requested review from halajohn and plutoless as code owners December 19, 2025 09:30

nitin4real changed the title ~~Fix/sarvam audio start_ms~~ fix: correct audio start_ms handling Dec 19, 2025

plutoless approved these changes Dec 19, 2025

View reviewed changes

plutoless merged commit cd90ec2 into main Dec 19, 2025
34 of 35 checks passed

plutoless deleted the fix/sarvam branch December 19, 2025 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: correct audio start_ms handling #1901

fix: correct audio start_ms handling #1901

Uh oh!

nitin4real commented Dec 19, 2025

Uh oh!

claude bot commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: correct audio start_ms handling #1901

fix: correct audio start_ms handling #1901

Uh oh!

Conversation

nitin4real commented Dec 19, 2025

Uh oh!

claude bot commented Dec 19, 2025

Code Review for PR #1901: Fix/sarvam audio start_ms

Summary

✅ Positive Aspects

🔍 Code Quality Observations

extension.py:349-360

🤔 Potential Issues & Questions

1. Edge Case: Reconnection Timing

2. Precision and Audio Duration Accuracy

3. Calculation Accuracy Concern

4. Test Coverage

🔒 Security Considerations

⚡ Performance Considerations

🧪 Test Coverage Recommendations

📝 Suggested Improvements

1. Add logging for timing calculations (debugging aid):

2. Consider defensive programming:

✨ Overall Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants