-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: feat: enhance Bytedance ASR configuration and processing #1855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: feat: enhance Bytedance ASR configuration and processing #1855
Conversation
f6033ee to
1745a1a
Compare
Pull Request Review: Enhanced Bytedance ASR Configuration and ProcessingSummaryThis PR enhances the Bytedance ASR extension with dynamic silence duration calculation, improved metadata extraction, and better result handling. The changes are focused on four files with 54 additions and 7 deletions. Code Quality & Best Practices✅ Strengths
|
1745a1a to
77821d1
Compare
PR Review: Enhance Bytedance ASR Configuration and ProcessingThank you for this enhancement! I've reviewed the changes and have some feedback organized by category. ✅ Positive Aspects
🐛 Potential Bugs & IssuesCritical: Incorrect
|
77821d1 to
5c6d636
Compare
Pull Request Review: Bytedance ASR Configuration and Processing EnhancementsThis PR introduces significant improvements to the Bytedance ASR extension, including utterance grouping, metadata extraction, and a configuration rename. Overall, the changes are well-structured and improve the extension's functionality. Here's my detailed review: ✅ Strengths1. Well-Designed Utterance Grouping LogicThe new utterance processing logic (
2. Comprehensive Test CoverageThe new test file
3. Metadata ExtractionThe
4. Code OrganizationGood refactoring with helper methods:
🔍 Issues & Recommendations1. Critical: Potential Division by Zero (High Priority)Location: self.audio_timeline.add_user_audio(
int(len(buf) / (self.input_audio_sample_rate() / 1000 * 2))
)Issue: If Recommendation: sample_rate = self.input_audio_sample_rate()
if sample_rate <= 0:
self.ten_env.log_error(f"Invalid sample rate: {sample_rate}")
return False
bytes_per_ms = (sample_rate / 1000 * 2)
duration_ms = int(len(buf) / bytes_per_ms) if bytes_per_ms > 0 else 0
self.audio_timeline.add_user_audio(duration_ms)2. Configuration Naming Clarity (Medium Priority)Location: The rename from Current: mute_pkg_duration_ms: int = (
800 # Mute package duration in milliseconds for finalization
)Suggestion: Consider 3. Type Safety in Utterance Grouping (Medium Priority)Location: for is_definite, group in itertools.groupby(
result.utterances, key=lambda utt: utt.definite
):Issue: If Recommendation: # Add validation before grouping
valid_utterances = [
utt for utt in result.utterances
if utt is not None and hasattr(utt, 'definite')
]
if not valid_utterances:
# Handle fallback case
...
for is_definite, group in itertools.groupby(
valid_utterances, key=lambda utt: utt.definite
):4. Missing Edge Case Handling (Low Priority)Location: concatenated_text = " ".join(
utt.text for utt in valid_utterances
)Issue: If Recommendation: concatenated_text = " ".join(
utt.text for utt in valid_utterances
if utt.text is not None
).strip()5. Typo in Comment (Low Priority)Location: # Invalid message format - umight be retryableShould be: 6. Test Metadata Validation Could Be Stricter (Low Priority)Location: The test sets expected metadata to tester.expected_results = [
{"text": "你好", "final": True, "metadata": {"speech_rate": 1.2, "volume": 0.8}},
{"text": "世界", "final": True, "metadata": {"speech_rate": 1.1}},
# ... etc
]Then add a check in 🔒 Security Considerations✅ No security issues identified. The changes:
⚡ Performance Considerations✅ Generally good performance:
📊 Test Coverage✅ Excellent test coverage for the new feature:
📝 Code QualityPositive:
Suggestions:
🎯 Recommendations SummaryMust Fix (Before Merge):
Should Fix:
Nice to Have:
✅ VerdictApprove with minor changes. This is a solid enhancement that significantly improves the ASR processing logic. The critical issue (#1) should be addressed before merge, but it's a straightforward fix. The test coverage is excellent and demonstrates good software engineering practices. Version Bump: The version bump from Great work! The utterance grouping logic is particularly well-designed and will improve the user experience significantly. |
ai_agents/agents/ten_packages/extension/bytedance_llm_based_asr/extension.py
Outdated
Show resolved
Hide resolved
ai_agents/agents/ten_packages/extension/bytedance_llm_based_asr/extension.py
Show resolved
Hide resolved
ai_agents/agents/ten_packages/extension/bytedance_llm_based_asr/extension.py
Outdated
Show resolved
Hide resolved
ai_agents/agents/ten_packages/extension/bytedance_llm_based_asr/extension.py
Outdated
Show resolved
Hide resolved
Code Review: PR #1855 - Enhance Bytedance ASR Configuration and ProcessingSummaryThis PR refactors the Bytedance ASR extension to improve utterance processing logic by sending each utterance individually rather than grouping them. The changes include renaming configuration parameters, extracting metadata from utterances, and adding comprehensive test coverage. Positive Aspects
Critical Issues1. Removal of Unique ID Generation (extension.py:411)The ASRResult no longer includes an id field. If downstream components rely on unique IDs to track individual ASR results, this could break existing functionality. Please verify if the id field is required by the ASRResult dataclass. 2. Import Statement Cleanup (extension.py:7)The uuid import is no longer used after removing UUID generation. Please remove this unused import. Medium Priority Issues3. Configuration Parameter Rename (config.py:56-58)silence_duration_ms to mute_pkg_duration_ms is a breaking change for existing configurations. Consider:
4. Type Safety in _extract_metadata (extension.py:362-388)The method catches TypeError and ValueError but only logs a warning. Consider validating metadata value types before adding them to the dictionary. 5. Empty Utterances Handling (extension.py:454-468)When result.utterances is empty, the code uses result.start_ms directly. Add a comment explaining that result.start_ms and utterance.start_time are compatible timestamp formats. Security Considerations
Performance ConsiderationsMultiple ASR Results: The new logic sends each utterance individually, which could increase message volume significantly. For a response with 6 utterances, you'll now send 6 separate ASR results instead of potentially fewer grouped results. Document this behavior change and ensure downstream components can handle the increased message rate. Test Coverage AssessmentStrengths:
Gaps:
Minor Suggestions
Final VerdictApprove with Changes Requested The core logic improvements are solid, but please address:
Overall, this is a well-structured refactor that improves code quality and testability. Risk: Medium (breaking config change and behavioral change in message sending) |
bb4c122 to
49c15cb
Compare
| MESSAGE_TYPE_SERVER_ERROR_RESPONSE = 0b1111 | ||
|
|
||
| # Message type specific flags | ||
| MESSAGE_TYPE_SPECIFIC_FLAGS_NO_SEQUENCE = 0b0000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why delete these?
| current_timestamp = int(asyncio.get_event_loop().time() * 1000) | ||
|
|
||
| for utterance in result.utterances: | ||
| # Skip utterances with invalid timestamps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why skip? any issue if just keep it as is?
| await self.send_asr_finalize_end() | ||
|
|
||
| # Reset two-pass delay metrics tracking for next recognition | ||
| self.two_pass_delay_tracker.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be reset by definite true (two pass, hard vad), not finalize_end, right?
5f46a0f to
fa93fa7
Compare
Code Review: Bytedance ASR Configuration and Processing EnhancementI've completed a comprehensive review of this PR. Overall, this is a well-structured refactoring that significantly improves the Bytedance ASR extension. Strengths1. Excellent Architecture Improvements
2. Strong Testing
3. Better Error Handling
Issues FoundCRITICAL: Resource Leak in on_deinit (extension.py:244-276)The vendor_result_dumper is opened in on_init without a context manager. This could lead to file descriptor leaks if on_deinit is never called or if exceptions occur during on_init. Recommendation: Wrap file I/O in a Dumper class (like audio_dumper) or use async with for proper resource management. BUG: Missing finalize() MethodThe code calls self.client.finalize() at extension.py:356 but this method is not defined in the visible parts of VolcengineASRClient. This needs verification to avoid AttributeError at runtime. SECURITY: Potential Path Traversal (log_id_dumper_manager.py:90-94)The log_id from ASR service response is used directly in filenames without sanitization. If a malicious service returns log_id with path traversal chars, files could be written outside dump_path. Recommendation: Sanitize log_id before using in filenames (e.g., use regex to keep only alphanumeric, dash, underscore). Code Quality ObservationsGood Practices:
Minor Issues:
Security Review
Test Coverage AssessmentWell-Covered:
Missing Coverage:
Summary & RecommendationsMust Fix Before Merge:
Should Fix: Nice to Have: Overall AssessmentThis is a high-quality refactoring that:
Recommendation: Approve with minor fixes (resource leak and finalize verification) Great work on maintaining code quality and following TEN Framework conventions! Reviewed by: Claude Code |
| # Reconnection parameters | ||
| self.max_retries: int = 5 | ||
| self.base_delay: float = 0.3 | ||
| self.max_retry_delay: float = 5.0 # Maximum delay between retries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the limitation, allow to retry always
| init_dump_file_path = os.path.join( | ||
| self.config.dump_path, "out_init.pcm" | ||
| ) | ||
| init_dump_file_path = os.path.join(self.config.dump_path, "out_init.pcm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is out_init?
Code Review: Enhance Bytedance ASR Configuration and ProcessingSummaryThis PR significantly refactors the Bytedance LLM-based ASR extension with improved configuration management, enhanced metrics tracking, and better debugging capabilities. The changes include configuration simplification, new audio dumping features, and comprehensive test coverage. ✅ Strengths1. Configuration Refactoring (config.py)
2. New LogIdDumperManager (log_id_dumper_manager.py)
3. Enhanced Metrics Tracking (extension.py)
4. Improved Reconnection Logic
5. Test Coverage
|
Pull Request Review: Bytedance ASR Configuration and Processing EnhancementsSummaryThis PR significantly refactors the Bytedance ASR extension with 1086 additions and 205 deletions across 10 files. The changes focus on configuration management, metadata extraction, metrics tracking, and audio dumping capabilities. Code Quality & Best Practices✅ Strengths
|
Code Review: Bytedance ASR Configuration and Processing EnhancementsSummaryThis PR refactors the Bytedance ASR extension with significant improvements to configuration management, audio dumping, metrics tracking, and test coverage. Overall, the changes are well-structured and follow the repository's conventions. ✅ Strengths1. Configuration Architecture
2. Audio Dumping Feature
3. Metrics & Monitoring
4. Reconnection Logic
5. Test Coverage
|
fe344d8 to
dc427fd
Compare
Code Review: PR #1855SummaryThis PR refactors the Bytedance ASR extension with improvements to configuration management, utterance grouping, metrics tracking, and audio dumping. Changes: 10 files, 1304 additions, 206 deletions. Critical Issues (Must Fix)1. Path Traversal Security Risk (log_id_dumper_manager.py:101-103)
2. Resource Leak in on_deinit (extension.py:217-248)
3. File Handle Leak (extension.py:162-170)
4. Race Condition in Reconnection (extension.py:432-473)
Important Issues5. Unbounded Retry Loop
6. Unused Import
7. Missing Test Coverage
Code Quality
Strengths
AssessmentQuality: Good | Test Coverage: Good (missing edges) | Security: Needs attention | Performance: Likely acceptable Recommendation: Request changes for critical security and resource management issues before merging. |
|
pls also checkout the claude code's review to make sure all critial issues have been fixed / clarified. @hanhandi-git |
Pull Request Review: Bytedance ASR EnhancementThis PR introduces significant improvements to the Bytedance ASR extension with 1319 additions and 211 deletions across 10 files. The changes include configuration refactoring, utterance grouping, metadata extraction, metrics tracking, and enhanced audio dumping. 🎯 SummaryThe PR successfully enhances the Bytedance ASR extension with several valuable features:
✅ Strengths1. Configuration Architecture (
|
7eb3f1e to
50f9d08
Compare
Code Review - PR #1855: Bytedance ASR EnhancementThank you for this comprehensive enhancement! I have reviewed the code and identified several areas for improvement. Strengths
High Priority IssuesBug 1: Resource Leak (extension.py:170) Bug 2: Security - Path Traversal (log_id_dumper_manager.py:101-103) Bug 3: Config Validation Missing Medium Priority IssuesCode Quality:
Test Coverage Gaps:
Performance:
Low PriorityDocumentation:
Code Structure:
ConclusionSolid enhancement overall! Address the 3 high-priority issues (resource leak, path traversal security, config validation) before merge. Medium priority items would improve maintainability significantly. Great work on the functionality additions! |
Code Review: Bytedance ASR EnhancementOverviewThis PR enhances the Bytedance ASR extension with improved configuration, utterance grouping, two-pass delay metrics, and debugging capabilities. Well-structured with good test coverage. Strengths
Critical Issues1. Resource Leak - extension.py:170-172File opened without proper error handling. Add try-except. 2. Unprotected Dict Access - config.py:158-161If config.params is None, will crash. Add null check. 3. Incomplete Error Handling - extension.py:392finalize() only logs errors, doesn't notify framework. High Priority4. Race Condition - log_id_dumper_manager.py:89-130push_bytes() during rename can lose data. Add asyncio.Lock. 5. Buffer Strategy Change - extension.py:396-397Changed from Discard to Keep(10MB) without explanation. Document or revert. 6. Commented Code - extension.py:310-317Remove or add TODO explaining why commented. Medium Priority
Summary
Before Merge
Overall: Well-designed enhancement. After fixing critical issues, ready to merge. Great work! |
3b716a6 to
aecc481
Compare
PR Review: Enhanced Bytedance ASR Configuration and ProcessingThank you for this comprehensive enhancement to the SummaryChanges: 1,332 additions, 219 deletions across 10 files Critical Issues1. Missing
|
aecc481 to
b5d46f9
Compare
Pull Request Review: Enhanced Bytedance ASR Configuration and ProcessingOverviewThis is a significant enhancement to the Bytedance ASR extension, adding ~1332 lines across 10 files. The PR introduces utterance grouping, metadata extraction, improved reconnection logic, and enhanced debugging capabilities. Overall code quality is good, with comprehensive test coverage for the new features. ✅ Strengths1. Excellent Test Coverage
2. Well-Structured Configuration Refactoring
3. Clean Resource Management
4. Good Separation of Concerns
🔍 Code Quality IssuesHigh Priority1. File Handle Leak Risk (extension.py:170-172)self.vendor_result_dumper = open(
vendor_result_dump_path, "a", encoding="utf-8"
)Issue: Regular Recommendation: # Option 1: Use context manager pattern with a wrapper
# Option 2: Ensure close in all error paths
try:
self.vendor_result_dumper = open(vendor_result_dump_path, "a", encoding="utf-8")
except Exception as e:
# Clean up any already-opened resources
raise2. Type Annotation Inconsistency (extension.py:111-112)self.vendor_result_dumper: Any = None # File handle for asr_vendor_result.jsonlIssue: Using Recommendation: from typing import TextIO
self.vendor_result_dumper: TextIO | None = None3. Commented Dead Code (extension.py:310-317)# if self.last_finalize_timestamp > 0:
# # Allow some time for final result to come back
# current_time = int(asyncio.get_event_loop().time() * 1000)
# if (
# current_time - self.last_finalize_timestamp
# < FINALIZE_GRACE_PERIOD_MS
# ):
# return True # Still consider connected during finalize grace periodIssue: Large block of commented code. Either remove it or document why it's disabled. Recommendation: Remove or add a comment explaining why this logic was disabled and whether it might return. 4. Potential Race Condition (log_id_dumper_manager.py:90-97)if self.dumper:
try:
await self.dumper.stop()
self.ten_env.log_info("Stopped dumper for renaming")
except Exception as e:
self.ten_env.log_error(f"Error stopping dumper: {e}")
finally:
self.dumper = NoneIssue: Between stopping the dumper and renaming the file, audio data could be pushed via Recommendation: Add a lock or flag to prevent Medium Priority5. Magic Numbers (extension.py:117-118)self.min_retry_delay: float = 0.5
self.max_retry_delay: float = 4.0Issue: Hardcoded retry values. These were previously configurable. Recommendation: Either make these configurable again or document why these specific values were chosen. 6. Inconsistent Error Handling (extension.py:656-657)Some errors are logged and swallowed, while others propagate. The pattern isn't always clear. Recommendation: Document error handling strategy in docstrings or module-level comments. 7. Buffer Strategy Changed Without Documentation (extension.py:396-397)def buffer_strategy(self) -> ASRBufferConfig:
return ASRBufferConfigModeKeep(byte_limit=1024 * 1024 * 10)Issue: Changed from Impact: Could affect memory usage and buffering behavior significantly. Low Priority8. Long Methods
9. Unused Importimport itertools # Used only once for groupbyNot an issue, but worth noting the import is used appropriately. 🔒 Security Considerations✅ Good Security Practices
|
Pull Request Review: Bytedance ASR Configuration and Processing EnhancementSummaryThis PR refactors the Bytedance ASR extension with significant improvements to configuration management, connection handling, and debugging capabilities. The changes include +1384/-228 lines across 11 files. ✅ Strengths1. Configuration Simplification
2. Robust Connection Management
3. Enhanced Debugging Capabilities
4. Better Error Handling
|
… audio file size in ASR tests
7fd9f84 to
72db0a4
Compare
PR Review: feat: enhance Bytedance ASR configuration and processingSummaryThis PR refactors the Bytedance ASR extension with significant architectural improvements, including configuration centralization, utterance grouping, enhanced dumping capabilities, and improved reconnection logic. Overall, the changes are well-structured and add valuable features. However, there are several areas that need attention. Critical Issues1. Potential Data Loss in LogIdDumperManager (log_id_dumper_manager.py:132-142)During file rename operation, push_bytes may silently fail, causing audio data loss. The method logs a warning but continues, which could result in incomplete dumps. Recommendation: Implement a buffering mechanism to queue bytes during rename operations, or add proper synchronization. 2. Race Condition in update_log_id (log_id_dumper_manager.py:60-130)The update_log_id method stops the dumper, renames the file, then restarts it. During this window, push_bytes calls will fail with no synchronization mechanism to prevent concurrent access. Recommendation: Add a lock/semaphore around dumper operations or buffer incoming data during the rename window. 3. Commented Out Connection Check (extension.py:269-277)Important finalize grace period logic is commented out. Either remove it entirely or uncomment with proper justification and testing. Major Concerns4. Type Annotation Inconsistency (extension.py:131)Using Any for vendor_result_dumper defeats type safety. Should be TextIO | None. 5. Resource Leak Risk (extension.py:239-242)File opened in on_init but only closed in on_deinit. If on_deinit is never called or fails early, the file handle leaks. Recommendation: Use context managers or ensure proper cleanup in try/finally blocks. 6. Incomplete Error Handling in on_init (extension.py:151-186)Files are opened but not cleaned up if subsequent initialization steps fail. 7. Deep Copy Performance (extension.py:461)deepcopy is expensive and called per utterance. Consider shallow copy if deep copy isn't necessary, or document why it's required. 8. Magic Numbers in Retry Logic (extension.py:113-114)Retry delays should be configurable or at least defined as module-level constants in const.py. Minor Issues
Security Considerations15. Sensitive Data in Logs (log_id_dumper_manager.py:56-58)File paths may contain sensitive information. Ensure log levels are appropriate for production. 16. File System Operations (log_id_dumper_manager.py:105)No validation that log_id is safe for use in filenames. Sanitize or validate it's alphanumeric to prevent path traversal attacks. Performance Considerations
Test CoverageStrengths:
Gaps:
Recommendations SummaryHigh Priority:
Medium Priority:
Low Priority:
ConclusionThis is a solid enhancement with valuable features. The architecture improvements are well-thought-out, and test coverage for the main feature is excellent. However, file handling and synchronization issues need to be addressed before merging. Overall Assessment: Approve with changes requested The core logic is sound, but critical issues around file handling should be fixed. Once addressed, this will be a great addition to the codebase. |

No description provided.