Skip to content

Conversation

@anexpn
Copy link
Contributor

@anexpn anexpn commented Dec 29, 2025

No description provided.

@claude
Copy link

claude bot commented Dec 29, 2025

Pull Request Review: Soniox Rotate Dump

Thanks for this PR! I've reviewed the changes and overall the implementation looks good. Here are my findings:

✅ What's Working Well

  1. Feature Design: The rotate dump functionality is well-designed with the callback pattern, allowing atomic rotation during finalization.

  2. Configuration: New config options (dump_rotate_on_finalize, enable_keepalive) are properly added to both the config model and the update() method.

  3. Code Organization: Creating a separate Dumper class improves modularity and follows the repository's patterns.

  4. Backward Compatibility: Changes are additive and won't break existing functionality.


🐛 Potential Bugs & Issues

Critical: Reconnection Behavior Change (websocket.py:177-179)

if self._attempt_count >= self.max_attempts:
    await asyncio.sleep(self.max_delay)
    return

Issue: This changes the behavior from stopping reconnection attempts after max_attempts to infinitely retrying. The old code would stop and raise an exception, but now it continues indefinitely.

Impact: If the Soniox service is down or there's a persistent network issue, the client will retry forever instead of failing gracefully.

Recommendation: Either:

  • Keep the original behavior (raise exception after max attempts)
  • Make infinite retry opt-in with a new config parameter
  • Document this behavior change in commit messages

Bug: Dumper not initialized properly (extension.py:247)

if self.config.dump_rotate_on_finalize and self.audio_dumper:

Issue: In _real_finalize(), you check if self.audio_dumper exists, but if dump=False and dump_rotate_on_finalize=True, the config validation doesn't catch this misconfiguration.

Recommendation: Add validation in config:

# In SonioxASRConfig or on_init
if self.config.dump_rotate_on_finalize and not self.config.dump:
    raise ValueError("dump_rotate_on_finalize requires dump=True")

Potential Race Condition (dumper.py:31-42)

async def rotate(self):
    if self._file:
        await self._file.close()
    # ... file is None here but might be accessed by push_bytes()
    self._file = await aiofiles.open(self.dump_file_path, mode="wb")

Issue: Between closing the old file and opening the new one, if push_bytes() is called concurrently, it will raise an exception.

Impact: Race condition if audio continues streaming during rotation.

Recommendation: Add a lock:

class Dumper:
    def __init__(self, dump_file_path: str):
        # ...
        self._lock = asyncio.Lock()
    
    async def rotate(self):
        async with self._lock:
            # existing code
    
    async def push_bytes(self, data: bytes):
        async with self._lock:
            # existing code

Resource Leak Risk (dumper.py:33-34)

if self._file:
    await self._file.close()

Issue: If opening the new file fails after closing the old one, self._file stays None but you've lost the file handle. Also, no error handling if rotation fails.

Recommendation: Add error handling:

async def rotate(self):
    """Close current file and open new one with timestamp suffix."""
    old_file = self._file
    
    try:
        # Generate new filename
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        base, ext = os.path.splitext(self.dump_file_base_path)
        self.dump_file_path = f"{base}_{timestamp}{ext}"
        
        # Open new file first
        os.makedirs(os.path.dirname(self.dump_file_path), exist_ok=True)
        new_file = await aiofiles.open(self.dump_file_path, mode="wb")
        
        # Only close old file after new one opens successfully
        if old_file:
            await old_file.close()
        
        self._file = new_file
    except Exception as e:
        # Keep old file handle if rotation fails
        self._file = old_file
        raise RuntimeError(f"Failed to rotate dump file: {e}") from e

Missing Null Check (extension.py:255-256)

if self.audio_dumper:
    await self.audio_dumper.rotate()

Issue: This check is inside the callback but self.audio_dumper could theoretically become None between the outer check and callback execution (though unlikely).

Impact: Low risk but defensive programming suggests keeping the check.

Status: Actually fine as-is, but consider the config validation issue above.


⚡ Performance Considerations

  1. File I/O in Critical Path: The rotation happens during finalization, which is in the audio processing path. This is acceptable since it's async, but be aware it adds latency to finalization.

  2. Timestamp Precision: Using seconds (%Y%m%d_%H%M%S) could cause filename collisions if multiple rotations happen within the same second. Consider adding milliseconds:

    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")[:17]  # includes milliseconds
  3. Keepalive Efficiency: The keepalive loop checks every 1 second with asyncio.sleep(1.0). This is fine, but could be optimized to sleep for keepalive_interval - time_since_last_audio for more precise timing.


🔒 Security Considerations

  1. Path Traversal: dump_path is user-configurable. If an attacker controls the config, they could potentially write files anywhere. Consider validating the path:

    import os.path
    # Ensure dump_path doesn't contain ../ or absolute paths outside allowed dirs
  2. Sensitive Data in Logs: The timestamp logging in extension.py:252 is fine, but ensure dump file paths (which may contain sensitive directory names) are not logged unnecessarily.


📋 Code Quality & Best Practices

  1. Type Annotations: Good use of type hints throughout. Minor suggestion for dumper.py:16:

    self._file: aiofiles.threadpool.binary.AsyncBufferedIOBase | None = None

    Consider using a type alias for readability:

    from typing import TypeAlias
    AsyncBinaryFile: TypeAlias = aiofiles.threadpool.binary.AsyncBufferedIOBase | None
  2. Callback Naming: The nested rotate_callback function in extension.py:249 could be named more descriptively, like finalize_and_rotate_callback.

  3. Error Messages: dumper.py:46-49 has a good error message. Consider logging the error before raising it.

  4. QueueItem Documentation: The new QueueItem class is well-documented. Good addition.

  5. Logging Category: extension.py:252 uses LOG_CATEGORY_KEY_POINT which is appropriate for finalization events.


🧪 Test Coverage

Missing Test Cases:

  1. Test for dump_rotate_on_finalize functionality
  2. Test for file rotation with concurrent audio streaming
  3. Test for enable_keepalive feature
  4. Test for invalid config (e.g., dump_rotate_on_finalize=True with dump=False)
  5. Test for rotation filename collision (multiple rotations per second)

Recommendation: Add tests in tests/test_finalize.py for rotation functionality.


📝 Additional Recommendations

  1. Manifest Version: You bumped to 0.3.6, which is correct for a minor feature addition. ✅

  2. Documentation: Consider adding comments explaining the rotation behavior, especially:

    • When rotation happens (only on finalize, not on file size/time)
    • What the timestamp format means
    • The purpose of the callback pattern
  3. Configuration Example: Update the extension's property.json example to show the new options:

    {
      "dump": true,
      "dump_path": "./audio_dumps",
      "dump_rotate_on_finalize": true,
      "enable_keepalive": true
    }
  4. Task Tracking: Consider storing the websocket task (extension.py:64) to handle cleanup better, which you've done. Good! But ensure it's cancelled on shutdown if needed.


Summary

Overall Assessment: Good feature implementation with solid design patterns. The main concerns are:

  1. High Priority: Fix the infinite reconnection behavior change (breaking change)
  2. Medium Priority: Add config validation for dump_rotate_on_finalize
  3. Medium Priority: Add concurrency protection (lock) to Dumper
  4. Low Priority: Improve error handling in rotation

Recommendation: Address the critical reconnection issue before merging. The other issues can be fixed in follow-up PRs if time is constrained, but ideally should be addressed now.

Great work on the implementation! The callback pattern for rotation is elegant and the code is generally clean and well-structured. 🚀

@anexpn anexpn changed the title Feat soniox rotate dump feat: add dump_rotate_on_finalize and enable_keepalive configs for soniox Dec 29, 2025
@claude
Copy link

claude bot commented Dec 29, 2025

PR Review: Add dump_rotate_on_finalize and enable_keepalive configs for soniox

Summary

This PR adds two new features to the Soniox ASR extension:

  1. Dump file rotation on finalize - Allows rotating audio dump files when finalization occurs
  2. Keepalive support - Enables WebSocket keepalive messages to maintain connection stability

Code Quality

Strengths:

  • Clean separation of concerns with the new Dumper class
  • Good use of async/await patterns throughout
  • Proper error handling with try-except blocks and logging
  • Thread-safe file operations using asyncio.Lock
  • Configuration properly integrated into existing config system

Areas for Improvement:

1. dumper.py:60-65 - Missing file flush
The push_bytes method does not flush data to disk. For real-time monitoring, consider adding periodic flushes.

2. dumper.py:35-58 - CRITICAL Race condition in rotate()
While rotate() uses a lock, push_bytes() does not acquire the lock before writing. This could cause data to be written to a closed file during rotation. Add async with self._lock in push_bytes().

3. websocket.py:177-179 - Removed max reconnect limit
The change means the client will retry indefinitely. Consider logging when max attempts is reached or add configurable behavior.

4. extension.py:64 - Task management
The ws_task is created but never explicitly cancelled in stop_connection(). Should add: if self.ws_task: self.ws_task.cancel()

Potential Bugs

1. CRITICAL - dumper.py:25 - Directory creation edge case
If dump_file_path is just a filename, os.path.dirname() returns empty string. Check if dir_path is not empty before creating directories.

2. extension.py:260 - Redundant null check
Redundant check for self.audio_dumper in rotate_callback (already checked in line 252).

Performance Considerations

1. dumper.py:60-65 - No buffering configuration
Consider adding explicit buffering for high-frequency audio data: buffering=64*1024

2. websocket.py:162-163 - Keepalive queue behavior
Keepalive messages go through the send queue, which could cause delays if backed up.

Security Concerns

1. No path traversal validation
dump_file_path is not validated. Malicious config could write files outside intended directories. Add path normalization and validation.

2. dumper.py:44 - Predictable filenames
Timestamp format is predictable. Ensure proper file permissions if dumps contain sensitive data.

Test Coverage

Missing test coverage for:

  • New Dumper class rotation functionality
  • New dump_rotate_on_finalize config option
  • New enable_keepalive config option
  • QueueItem with callback functionality
  • Edge cases: rotation failures, concurrent writes during rotation

Documentation

Missing:

  • Docstrings for new Dumper class methods
  • Documentation for dump_rotate_on_finalize use cases
  • Documentation for enable_keepalive behavior
  • Comment explaining max_attempts change

Recommendations

  1. Fix the critical race condition in dumper.py by adding lock to push_bytes()
  2. Clean up the ws_task properly in stop_connection()
  3. Add test coverage for new features
  4. Document the behavior changes
  5. Add path validation for security

Overall Assessment

Verdict: Good feature addition with solid implementation, but needs fixes before merge.

Blocking issues:

  • Race condition in Dumper (push_bytes without lock)
  • Task lifecycle management for ws_task

Non-blocking issues:

  • Missing test coverage
  • Documentation gaps
  • Minor security hardening

The code follows existing patterns well. After addressing the race condition and task management issues, this should be good to merge.


Review generated with Claude Code

@claude
Copy link

claude bot commented Dec 29, 2025

PR Review: Add dump_rotate_on_finalize and enable_keepalive configs for soniox

Summary

This PR adds two new configuration options to the Soniox ASR extension:

  1. dump_rotate_on_finalize: Rotates audio dump files when finalization occurs
  2. enable_keepalive: Enables keepalive messages for the websocket connection

The implementation includes creating a custom Dumper class with rotation capability and modifying the websocket client to support keepalive and callback-based message sending.


✅ Code Quality & Best Practices

Strengths:

  • Clean separation of concerns with new dumper.py module
  • Good use of async/await patterns throughout
  • Proper use of Pydantic for config management
  • Follows existing codebase patterns for extension development
  • Clear documentation via docstrings and comments

Suggestions:

  1. Type annotations in dumper.py:15-16 - Consider using Optional from typing for clarity:

    self._file: Optional[aiofiles.threadpool.binary.AsyncBufferedIOBase] = None
  2. Lock usage in dumper.py - The _lock is used in stop() and rotate() but not in push_bytes(). This could lead to race conditions if bytes are pushed during rotation. Consider:

    async def push_bytes(self, data: bytes):
        async with self._lock:
            if not self._file:
                raise RuntimeError(...)
            await self._file.write(data)
  3. Config update method - In config.py:36-42, the special_params list duplicates field names. Consider generating this from the model fields:

    special_params = [f for f in self.model_fields.keys() if f != "params"]

🐛 Potential Bugs & Issues

  1. Critical: Race condition in dumper rotation (dumper.py:59)

    • push_bytes() is not protected by the lock while rotate() is
    • If audio data is being written during rotation, it could write to a closed file handle
    • Fix: Add lock protection to push_bytes() as shown above
  2. Memory leak potential in websocket.py:177-179

    • When max_attempts is reached, the code now just sleeps but doesn't stop the reconnection loop
    • The original code set self.state = self.State.STOPPED, which was removed
    • Impact: Infinite reconnection attempts consuming resources
    • Recommendation: Restore stopping behavior or document why infinite retries are desired:
    if self._attempt_count >= self.max_attempts:
        self.ten_env.log_warn("Max reconnection attempts reached")
        # Consider: self.state = self.State.STOPPED
        await asyncio.sleep(self.max_delay)
        return
  3. Error handling in extension.py:207-210

    • push_bytes() exception is caught and logged as warning, but dumping continues
    • If dumper fails to start or rotate, subsequent writes will raise RuntimeError
    • Consider adding a flag to disable dumper after failures:
    self.audio_dumper_enabled = True
    
    # In push_bytes exception handler:
    self.audio_dumper_enabled = False
    self.ten_env.log_error("Disabling audio dumper due to error")
  4. Unused task reference in extension.py:64

    • self.ws_task is stored but never awaited or cancelled
    • Recommendation: Either await it during cleanup or remove the reference

🔒 Security Considerations

Good:

  • No sensitive data in the new dumper module
  • File paths are validated with os.makedirs(exist_ok=True)
  • Uses secrets.token_hex() for unique filenames

Concerns:

  1. Path traversal risk (dumper.py:22-24)

    • dump_file_path from config is not validated
    • Malicious config could write files outside intended directory
    • Recommendation: Add path validation:
    dump_file_path = os.path.abspath(dump_file_path)
    base_dir = os.path.abspath(self.config.dump_path)
    if not dump_file_path.startswith(base_dir):
        raise ValueError("Invalid dump_file_path")
  2. Resource exhaustion

    • No limit on number of rotated dump files
    • Could fill disk with finalize_mode=IGNORE and frequent finalizations
    • Recommendation: Consider adding max file count or cleanup policy

⚡ Performance Considerations

Good:

  • Async file I/O with aiofiles
  • Minimal overhead in hot path (push_bytes is simple)
  • Keepalive only enabled when configured

Suggestions:

  1. File buffering - Consider adding write buffering to reduce I/O:

    self._file = await aiofiles.open(self.dump_file_path, mode="wb", buffering=8192)
  2. Rotation overhead - The rotate operation happens synchronously in the finalize callback, which blocks sending the finalize message. Consider:

    • Pre-creating the new file before closing the old one (already done ✓)
    • Moving rotation to a background task if it becomes a bottleneck
  3. Keepalive sleep interval (websocket.py:168) - Currently sleeps for 1 second between checks. This is reasonable, but consider making it configurable for tuning.


🧪 Test Coverage

Missing:

  • No tests for the new Dumper class
  • No tests for rotation logic
  • No tests for keepalive functionality
  • No tests for the QueueItem callback mechanism

Recommendations:
Add tests in tests/ directory:

  • test_dumper_rotation.py - Test rotation creates new files correctly
  • test_dumper_concurrent.py - Test concurrent push_bytes during rotation
  • test_keepalive.py - Test keepalive messages are sent correctly
  • test_finalize_callback.py - Test rotation callback is invoked

📋 Additional Observations

  1. Config default values (config.py:20,24)

    • Both new configs default to False, requiring explicit opt-in
    • Good conservative default ✓
  2. Backward compatibility

    • Changes are fully backward compatible
    • Existing configs continue to work ✓
  3. Logging - Good use of logging categories (LOG_CATEGORY_KEY_POINT, warnings for errors)

  4. Manifest version bump - Correctly incremented from 0.3.5 to 0.3.6 ✓


🎯 Summary & Recommendation

Overall Assessment: This is a solid feature addition with good architecture. The main concerns are:

  1. ⚠️ Critical: Race condition in dumper (add lock to push_bytes)
  2. ⚠️ Important: Infinite reconnection behavior change needs clarification
  3. 🔒 Security: Path traversal validation needed
  4. 🧪 Testing: Missing test coverage for new features

Recommendation:Approve with changes requested

The core functionality is well-implemented, but the race condition should be fixed before merging. The reconnection behavior change should be clarified (intentional vs bug). Consider adding at least basic tests for the rotation feature.

Great work on the clean separation of concerns and following the repository's patterns! 👍

@claude
Copy link

claude bot commented Dec 29, 2025

Pull Request Review: Add dump_rotate_on_finalize and enable_keepalive configs for soniox

Summary

This PR adds two new configuration options to the Soniox ASR extension:

  1. dump_rotate_on_finalize - Enables rotating audio dump files on finalization
  2. enable_keepalive - Enables websocket keepalive messages

The implementation includes a new custom Dumper class with file rotation capabilities.


✅ Code Quality & Best Practices

Strengths:

  • Good use of type hints throughout (e.g., asyncio.Task[None] | None, AsyncBufferedIOBase | None)
  • Proper async/await patterns with lock protection for concurrent operations
  • Clean separation of concerns with the new Dumper class
  • Follows existing codebase patterns for error handling and logging

Suggestions:

  1. Missing docstring in dumper.py:62-66 - The push_bytes method would benefit from a docstring explaining its purpose and exception behavior

  2. Type annotation inconsistency - In dumper.py:18, consider using AsyncBufferedIOBase | None instead of the long form for consistency with modern Python style.


🐛 Potential Bugs & Issues

Critical Issues:

  1. Race condition in dumper.py:62-67 - push_bytes checks self._file without lock protection, but rotate() and stop() modify it. This could cause writing to a closed file during rotation or using a stale file handle. Fix: Wrap the entire method in the lock.

  2. Incomplete ws_task lifecycle management in extension.py:150 - The task is created but never awaited or cancelled. This could lead to task exceptions not being propagated and resource leaks on extension shutdown. Fix: Add proper cancellation in stop_connection().

Minor Issues:

  1. websocket.py:284 - The callback is awaited before sending the message, but the callback logs "Sending finalize...". The log message timing is misleading.

  2. dumper.py:44 - Using secrets.token_hex(3) produces 6 hex characters. Consider documenting why 3 bytes is sufficient for uniqueness in this context.


⚡ Performance Considerations

Good practices:

  • Lock-protected file operations prevent contention
  • Async I/O with aiofiles is appropriate for this use case

Concerns:

  1. dumper.py:49-50 - makedirs is called synchronously inside an async function. Consider using asyncio.to_thread for true async behavior.

  2. extension.py:213-218 - Dumper errors are logged but do not halt audio processing. If the dumper repeatedly fails, you will accumulate many warning logs. Consider adding a failure counter or disabling the dumper after N consecutive failures.

  3. websocket.py:183-185 - When max attempts are reached, the code now sleeps for max_delay and continues retrying indefinitely. This changes the previous behavior where it would stop and emit an exception. This could mask connection issues.


🔒 Security Concerns

Good practices:

  • Using secrets.token_hex() for random filenames (cryptographically secure)
  • No sensitive data logged in error messages

Recommendations:

  1. Path traversal protection - dump_path comes from configuration but is not validated. A malicious config could specify ../../../etc/. Add validation to ensure the resolved path is within allowed directories.

  2. File permission concerns - The dumper creates files with default permissions. Consider explicitly setting secure permissions (e.g., 0o600 for owner read/write only).


🧪 Test Coverage

Missing test scenarios:

  1. No tests for new dumper rotation feature - Critical functionality should have tests for rotation success, rotation during active writes, and failure recovery.

  2. No tests for keepalive feature - Should verify keepalive messages are sent after interval, not sent when audio flows, and stops on connection close.

  3. Edge cases to test: Dumper rotation idempotency, concurrent finalize calls with rotation, and websocket reconnection with keepalive.

Recommendation: Add a new test file test_dump_rotate.py following the pattern of test_finalize.py.


📝 Additional Notes

  1. Breaking change in websocket.py:183-185 - The removal of the STOPPED state transition changes retry behavior. Was this intentional?

  2. config.py - Good practice adding new fields to the special_params list.

  3. Version bump - Correctly bumped version from 0.3.5 to 0.3.6 in manifest.json.

  4. Consider adding configuration validation for dump_rotate_on_finalize - it only makes sense when dump=True.


🎯 Summary

Overall Assessment: The PR adds useful features with generally good code quality. However, there are critical concurrency issues that should be addressed before merging.

Priority Issues:

  1. 🔴 CRITICAL: Fix race condition in Dumper.push_bytes() (add lock protection)
  2. 🔴 CRITICAL: Add proper lifecycle management for ws_task
  3. 🟡 HIGH: Add test coverage for rotation and keepalive features
  4. 🟡 HIGH: Validate dump_path to prevent path traversal

Estimated Risk: Medium - The features are opt-in via configuration, but the concurrency issues could cause crashes or data corruption when enabled.

Great work on the implementation! Please address the critical issues and consider the suggestions for a more robust solution. Happy to re-review once updated. 🚀

@claude
Copy link

claude bot commented Dec 29, 2025

PR Review: Add dump_rotate_on_finalize and enable_keepalive configs for soniox

Summary

This PR adds two new configuration options to the Soniox ASR extension:

  1. dump_rotate_on_finalize - Enables audio dump file rotation when finalization occurs
  2. enable_keepalive - Enables WebSocket keepalive messages to maintain connection during silence

The changes also introduce a custom Dumper class to replace the base class from ten_ai_base.dumper.


Code Quality and Best Practices

Strengths:

  • Good separation of concerns with the new Dumper class in a separate module
  • Proper use of async/await patterns throughout
  • Consistent error handling with try/except blocks and logging
  • Type hints are well-used (modern Python union syntax with |)
  • Good documentation in docstrings

Areas for improvement:

  1. dumper.py:53 - The filename generation could have collision issues. While secrets.token_hex(3) provides randomness, if rotate() is called multiple times within the same millisecond, collisions are possible. Consider using a counter or UUID for guaranteed uniqueness.

  2. dumper.py:33-34 - _ensure_dir() is called synchronously within async context. For consistency with async patterns, consider using aiofiles.os or wrapping in asyncio.to_thread() for I/O operations.

  3. websocket.py:290 - Magic number for audio frame duration (10ms). The comment on line 110 mentions this is an assumption. Consider making this a named constant: AUDIO_FRAME_DURATION_MS = 10.


Potential Bugs and Issues

Critical:

  1. websocket.py:187-189 - Infinite retry behavior change: The old code stopped after max_attempts. The new code continues indefinitely, just sleeping for max_delay. This is a significant behavior change that could cause resource leaks if connections never succeed, difficulty in detecting permanent failures, and no way for calling code to know connection has failed permanently. Recommendation: Add a configuration option for this behavior or document it clearly. Consider adding a maximum total retry time limit.

  2. dumper.py:45-69 - File handle leak on rotation failure: If old_file.close() raises an exception, new_file is left open but unreferenced. The exception path should ensure new_file is closed.

Medium:

  1. extension.py:263 - Callback function captures self but doesn't validate state. The callback is created but may be called after the extension is stopped. Add additional validation or ensure callbacks can't be invoked after cleanup.

  2. websocket.py:310 - QueueItem always requires callback. The dataclass doesn't have a default for before_send_callback. Consider making it Optional[Callable] or providing a no-op default.


Performance Considerations

Good:

  • Async I/O with aiofiles prevents blocking the event loop
  • Lock usage in Dumper prevents race conditions
  • Queue-based message sending is efficient

Considerations:

  1. dumper.py:71-77 - Lock held during file write. Every audio chunk write holds the lock, blocking rotation. For high-throughput audio streams, consider buffering writes, using a separate write queue, or releasing the lock before the actual write operation.

  2. websocket.py:166-178 - Keepalive loop wakes up every second. This is reasonable, but could be optimized to wake up closer to when the next keepalive is actually needed.


Security Concerns

Good:

  • secrets.token_hex() used for random filename component (cryptographically secure)
  • No SQL injection or command injection risks
  • API keys are already handled elsewhere with encryption

Considerations:

  1. dumper.py:34 - Path traversal potential. If dump_path comes from user configuration, it should be validated to prevent directory traversal attacks. Consider validating the path is within an allowed directory, resolving symlinks and checking the real path, and rejecting paths with .. components.

  2. dumper.py:37 - File path construction. While _current_file_name is controlled internally, ensure base dump_file_name passed to init is validated (no path separators).


Test Coverage

Missing: This PR adds significant new functionality but NO new tests are included:

  • No tests for the new Dumper class (rotation, error handling, concurrent access)
  • No tests for dump_rotate_on_finalize configuration
  • No tests for enable_keepalive functionality
  • No tests for the infinite retry behavior
  • No tests for the QueueItem callback mechanism

Recommendations:

  1. Add unit tests for dumper.py
  2. Add integration tests for extension.py
  3. Add tests for websocket.py

Summary

Must Fix (Blocking):

  • File handle leak in dumper.py:45-69 rotation error path
  • Consider implications of infinite retry behavior or make it configurable

Should Fix (Important):

  • Add comprehensive test coverage
  • Path traversal validation for dump paths
  • Make QueueItem.before_send_callback optional or document it's always required
  • Extract magic number (10ms) to constant

Nice to Have:

  • Consider async-safe directory creation
  • Optimize keepalive sleep interval
  • Improve filename collision resistance
  • Add documentation for new features

The code is well-structured and follows good async patterns, but needs test coverage and a few bug fixes before merging.

@anexpn anexpn force-pushed the feat-soniox_rotate_dump branch from c5c912f to df27b4d Compare December 29, 2025 10:47
@claude
Copy link

claude bot commented Dec 29, 2025

Code Review: Soniox Dump Rotation & Keepalive Features

Overview

This PR introduces two valuable features to the Soniox ASR extension:

  • Dump file rotation on finalization for better audio debugging workflows
  • WebSocket keepalive support for more stable long-running connections

The implementation follows good architectural patterns with a dedicated Dumper class and clean callback integration. Overall quality is solid, but there are several important issues to address.


Critical Issues

1. Race Condition in Dumper (dumper.py:71-77)

Severity: HIGH

async def push_bytes(self, data: bytes):
    async with self._lock:  # ✅ Good - but this was added after initial review
        if not self._file:
            raise RuntimeError(...)
        _ = await self._file.write(data)

Actually GOOD - The lock is properly used in the latest code. However, be aware that start() doesn't use the lock, which could theoretically race with stop() if called concurrently. Consider adding lock protection to start() as well.

2. Websocket Reconnection Behavior Changed (websocket.py:187-189)

Severity: HIGH

# Old behavior: Stop after max attempts
if self._attempt_count >= self.max_attempts:
    await asyncio.sleep(self.max_delay)
    return  # Continues loop - infinite retries!

Issue: The commit "feat: no max reconnect attempt limit" removes the hard failure after max_attempts. This fundamentally changes the extension's behavior from "fail after N attempts" to "retry indefinitely with backoff".

Impact:

  • No way to detect permanent connection failures
  • Could mask configuration issues (wrong URL, invalid credentials)
  • Resource consumption continues indefinitely

Recommendation: If infinite retry is intentional:

  1. Document this in the commit message and config
  2. Consider renaming max_attempts to backoff_attempts to clarify it only controls backoff timing
  3. Add logging: self.ten_env.log_warn("Max reconnection attempts reached, continuing with max delay...")
  4. Consider adding a separate max_total_time configuration

3. Task Lifecycle Not Managed (extension.py:64, 150)

Severity: MEDIUM

self.ws_task = asyncio.create_task(ws.connect())  # Created but never cancelled

Issue: The websocket task is stored but never explicitly cancelled during shutdown. While Python will cancel tasks on event loop closure, this can lead to noisy logs about pending tasks.

Recommendation:

async def stop_connection(self):
    if self.websocket:
        await self.websocket.stop()
    if self.ws_task and not self.ws_task.done():
        self.ws_task.cancel()
        try:
            await self.ws_task
        except asyncio.CancelledError:
            pass

Security Concerns

Path Traversal Risk (dumper.py:31-37)

Severity: MEDIUM

The dump_dir_path is user-configurable and not validated. A malicious or misconfigured property.json could write files outside intended directories:

# Config could contain: "dump_path": "../../../../etc"

Recommendation:

def __init__(self, dump_dir_path: str, dump_file_name: str):
    # Resolve and validate path
    self.dump_dir_path = os.path.abspath(dump_dir_path)
    # Optional: validate it's within allowed base path
    # if not self.dump_dir_path.startswith(ALLOWED_BASE_PATH):
    #     raise ValueError("dump_path outside allowed directory")

Code Quality Issues

1. Incomplete Error Recovery (extension.py:261-268)

Severity: LOW

try:
    await self.audio_dumper.rotate()
except Exception as e:
    self.ten_env.log_warn(f"Failed to rotate audio dumper, error: {str(e)}")
    # What happens next? Dumper is in unknown state

The rotation failure is logged but there's no recovery strategy. If rotation fails:

  • Old file might be closed (depending on where the exception occurred)
  • New file might not be opened
  • Subsequent writes will fail

Recommendation: The current rotate() implementation handles this well by keeping old_file reference and only swapping on success. Document this behavior and consider adding a metric/counter for rotation failures.

2. Directory Creation Timing (dumper.py:33, 56)

Minor: Code Smell

def _ensure_dir(self):
    os.makedirs(self.dump_dir_path, exist_ok=True)  # Sync I/O in async context

While makedirs is typically fast, for consistency with async patterns, consider:

def _ensure_dir(self):
    # makedirs is fast enough for local filesystems, but document the assumption
    os.makedirs(self.dump_dir_path, exist_ok=True)
    
# Or if dealing with slow/network filesystems:
async def _ensure_dir(self):
    await asyncio.to_thread(os.makedirs, self.dump_dir_path, exist_ok=True)

3. Magic Number in Websocket (websocket.py:290)

Minor: Maintainability

self._total_sent_audio_ts_ms += 10  # Assumes 10ms audio frames

Recommendation:

ASSUMED_AUDIO_FRAME_DURATION_MS = 10  # Document this assumption
# ...
self._total_sent_audio_ts_ms += ASSUMED_AUDIO_FRAME_DURATION_MS

4. Filename Collision Risk (dumper.py:52)

Severity: LOW

f"{base}_{int(time.time() * 1000)}_{secrets.token_hex(3)}{ext}"

Good use of secrets for randomness! The timestamp + 6 random hex chars makes collisions extremely unlikely. Consider documenting the collision probability or making the token size configurable if needed.


Performance Considerations

1. Lock Contention on Audio Path (dumper.py:71-77)

Severity: LOW

Every audio chunk write acquires the lock, which could contend with rotation. Given that:

  • Writes are fast (just buffer copies with aiofiles)
  • Rotation is rare (only on finalize)
  • Lock granularity is appropriate

This is acceptable, but monitor if audio latency becomes an issue.

2. Keepalive Check Interval (websocket.py:178)

Minor Optimization

await asyncio.sleep(1.0)  # Check every second

You could optimize by sleeping for the remaining time:

sleep_duration = max(0.1, self.keepalive_interval - time_since_last_audio)
await asyncio.sleep(sleep_duration)

Though the current implementation is simpler and 1-second granularity is fine for keepalive purposes.


Missing Test Coverage

The following scenarios lack test coverage:

  1. ✅ Dump rotation creates new file with timestamp
  2. ✅ Rotation during active audio streaming
  3. ✅ Rotation failure recovery
  4. ✅ Keepalive messages sent after interval
  5. ✅ Keepalive not sent when audio is flowing
  6. ✅ Callback invocation before finalize message
  7. ✅ Config validation for incompatible options

Recommendation: Add tests in tests/test_dump_rotate.py following the existing test patterns. At minimum, test the happy path for rotation.


Configuration Validation

Missing Validation (config.py)

# No validation for:
if config.dump_rotate_on_finalize and not config.dump:
    # This configuration doesn't make sense
    raise ValueError("dump_rotate_on_finalize requires dump=True")

Add this check in on_init() after config validation.


Positive Observations

  1. Clean Architecture - The Dumper class is well-designed with single responsibility
  2. Thread Safety - Proper use of asyncio.Lock for concurrent operations
  3. Error Handling - Good use of try/except with contextual logging
  4. Backward Compatibility - All changes are opt-in via configuration
  5. Type Hints - Excellent use of modern Python type annotations
  6. Config Management - Proper integration into existing Pydantic config system
  7. Rotation Strategy - Open new file before closing old one prevents data loss
  8. Version Bump - Correctly incremented to 0.3.6

Summary & Recommendations

Must Fix Before Merge:

  1. ⚠️ Clarify infinite retry behavior - Document or reconsider the websocket reconnection changes
  2. ⚠️ Add config validation - Validate dump_rotate_on_finalize requires dump=True
  3. ⚠️ Cleanup ws_task - Properly cancel websocket task on shutdown

Should Fix:

  1. 🔒 Path validation - Add security check for dump_path
  2. 🧪 Add tests - At least basic rotation functionality

Nice to Have:

  1. 📖 Documentation - Add comments explaining rotation behavior
  2. 🔧 Named constants - Replace magic numbers
  3. 🎯 Lock in start() - Complete thread safety coverage

Overall Assessment

Good feature addition with solid engineering. The core functionality is well-implemented, but the websocket reconnection behavior change needs clarification, and basic safeguards (config validation, task cleanup) should be added.

Recommendation: ✅ Approve with minor changes requested

Great work on the clean implementation and following the repository's patterns! The callback-based rotation is an elegant solution. 🚀

@claude
Copy link

claude bot commented Dec 30, 2025

PR Review: Add dump_rotate_on_finalize and enable_keepalive configs for soniox

Summary

This PR adds two new configuration options to the Soniox ASR extension:

  1. dump_rotate_on_finalize: Rotates audio dump files on finalization
  2. enable_keepalive: Enables WebSocket keepalive messages

Overall, this is a well-structured PR with good test refactoring and solid implementation.


Code Quality & Best Practices

Strengths:

  • Clean abstraction: The new Dumper class (dumper.py) is well-designed with proper async/await patterns and locking mechanisms
  • Test improvements: Excellent refactoring of test fixtures - moving from mock.py to conftest.py and creating reusable test helpers
  • Configuration pattern: Follows existing patterns for config management
  • Error handling: Good use of try-except blocks with appropriate logging at warn level
  • Logging categories: Proper use of LOG_CATEGORY_KEY_POINT and LOG_CATEGORY_VENDOR

Suggestions:

  1. dumper.py:77 - Consider removing unused return value assignment
  2. websocket.py:287 - Add type hints to Callable in QueueItem for clarity
  3. Ensure consistent import organization across test files

Potential Bugs & Issues

Critical:

  1. websocket.py:187-189 - Unlimited reconnection attempts: When max_attempts is reached, the code only sleeps but continues retrying indefinitely. This could lead to resource exhaustion. Consider adding a configurable max retry limit, implementing a circuit breaker pattern, or at minimum, log a warning when exceeding max_attempts.

  2. dumper.py:68 - File handle leak on rotation failure: If new_file fails to open partway through, there is potential for inconsistent state. Consider more defensive handling.

Minor:

  1. extension.py:148 - Task reference stored but never explicitly cancelled in stop_connection()
  2. websocket.py:162 - Unused ws parameter in keepalive loop

Performance Considerations

Good:

  • Async file I/O using aiofiles is excellent
  • Lock granularity in Dumper is appropriately scoped
  • Queue-based send pattern prevents blocking

Concerns:

  1. dumper.py:53 - Consider time.monotonic() for guaranteed ordering or counter-based scheme
  2. websocket.py:290 - Fragile assumption about 10ms audio frames
  3. websocket.py:178 - Could reduce keepalive check from 1s to 5s for better performance

Security Concerns

Recommendations:

  1. dumper.py:34 - Add path validation to prevent directory traversal and ensure write permissions
  2. dumper.py:58 - Consider sanitizing dump_file_name to prevent path injection

Test Coverage

Excellent:

  • Test fixture refactoring is a significant improvement
  • Test helpers make tests more maintainable
  • Callback testing properly verifies before_send_callback mechanism

Missing Coverage:

  1. No tests for Dumper class - rotate() error handling, concurrent push_bytes(), start() idempotency, file system errors
  2. No tests for keepalive - message sending, timer reset, connection close behavior
  3. No integration test for dump rotation with dump_rotate_on_finalize: true

Recommendations Priority

High Priority:

  1. Fix unlimited reconnection attempt issue (websocket.py:187)
  2. Add cancellation of ws_task in stop_connection (extension.py:148)
  3. Add unit tests for Dumper class
  4. Add path validation to dumper (security)

Medium Priority:
5. Clean up unused return value assignment (dumper.py:77)
6. Add type hints to QueueItem callback
7. Calculate actual audio duration instead of assuming 10ms
8. Add keepalive integration tests

Low Priority:
9. Remove unused ws parameter from _keepalive_loop
10. Consider squashing commits
11. Reduce keepalive check frequency


Summary

This is a high-quality PR that adds useful functionality with good code structure. The test refactoring is particularly commendable. Main concerns are around the unlimited retry behavior and missing test coverage for the new Dumper class. With the high-priority issues addressed, this will be a solid addition to the codebase.

Overall Assessment: Approve with recommended changes

Great work!

@anexpn anexpn force-pushed the feat-soniox_rotate_dump branch from 699fffc to 0e61bbd Compare December 30, 2025 02:43
@claude
Copy link

claude bot commented Dec 30, 2025

PR Review: Add dump_rotate_on_finalize and enable_keepalive configs for soniox

Summary

This PR adds two new configuration options to the Soniox ASR extension:

  1. dump_rotate_on_finalize: Rotates audio dump files when finalization occurs
  2. enable_keepalive: Enables WebSocket keepalive messages to prevent connection timeouts

The implementation includes a new custom Dumper class replacing the base ten_ai_base dumper, refactored test fixtures, and support for unlimited reconnection attempts.

Strengths

  1. Well-structured Dumper class (dumper.py:14-77)

    • Thread-safe with asyncio.Lock for concurrent operations
    • Atomic file rotation with proper error handling
    • Creates new file before closing old one to prevent data loss
  2. Good error handling

    • Dumper operations wrapped in try-except blocks
    • Errors logged as warnings rather than failing the main flow
    • Graceful degradation when dump operations fail
  3. Test improvements

    • Refactored fixtures into conftest.py for better reusability
    • Factory pattern with create_fake_websocket_mocks and inject_websocket_mocks
    • Tests updated to use relative imports
  4. Keepalive implementation

    • Proper async task management with cleanup
    • Configurable interval (15s default)
    • Only sends keepalive when no audio for specified duration

Critical Issues

  1. Race condition in Dumper.rotate() (dumper.py:45-69)

    • If old_file.close() raises an exception, the except block sets self._file = old_file, but that handle may be in an invalid state
    • Recommendation: Consider whether keeping a potentially corrupted file handle is safer than setting self._file = None on rotation failure
  2. Unbounded reconnection attempts (websocket.py:186-189)

    • The commit message says no max reconnect attempt limit but this causes infinite reconnection attempts even for fatal errors (401 Unauthorized, 402 Payment Required)
    • The websocket will keep retrying every 10 seconds indefinitely
    • Recommendation: Add a mechanism to stop reconnecting for fatal errors (400-series responses), or make the infinite reconnection behavior more explicit with circuit breaker patterns

Medium Priority Issues

  1. File descriptor leak on start() double call (dumper.py:22-31)

    • If start() is called while a file is already open, the method returns early without logging
    • Recommendation: Consider logging a warning when start() is called with an already-open file
  2. Incomplete file writes on stop (dumper.py:39-43)

    • No explicit flush before close
    • Recommendation: Consider adding await self._file.flush() before close for safety

Test Coverage Gaps

  • No explicit test for dump_rotate_on_finalize feature
  • No test for keepalive functionality
  • No test for rotation failure scenarios
  • No test for concurrent dumper access (though lock protects it)

Recommendations Summary

Must fix:

  1. Document or fix the infinite reconnection behavior for fatal errors
  2. Consider rotation failure handling strategy (keep bad handle vs. close dumper)

Should fix:
3. Add tests for new features (dump rotation, keepalive)
4. Add warning log on Dumper.start() double call
5. Consider explicit flush before Dumper.stop()

Verdict

Approve with minor suggestions

The PR implements useful features with good code quality and proper error handling. The critical concerns are around edge cases (infinite reconnection, rotation failures) that should be addressed or explicitly documented as intended behavior. The test refactoring is well done and improves maintainability.

Great work overall! The new Dumper class is well-designed and the keepalive feature addresses real connection stability issues.

@claude
Copy link

claude bot commented Dec 30, 2025

Pull Request Review: Add dump_rotate_on_finalize and enable_keepalive configs for soniox

Summary

This PR adds two important features to the soniox_asr_python extension:

  1. dump_rotate_on_finalize: Rotates audio dump files on finalize to separate audio segments
  2. enable_keepalive: Adds WebSocket keepalive support to maintain connections during periods of silence

The PR also includes significant test refactoring, moving mock infrastructure from a separate file into conftest.py fixtures.

✅ Strengths

Code Quality

  • Well-structured dumper implementation: The new dumper.py follows good async patterns with proper locking
  • Thread-safe design: Consistent use of asyncio.Lock to protect file operations
  • Error handling: Proper exception handling with graceful degradation (errors logged as warnings)
  • Test improvements: Excellent refactoring of test infrastructure into reusable fixtures

Features

  • Keepalive mechanism: Smart implementation that only sends keepalive when no audio sent for 15s
  • Flexible configuration: Both features are opt-in with sensible defaults
  • Callback pattern: Clean before-send callback pattern for coordinating dump rotation with finalize message

🔍 Issues & Concerns

1. Critical: Race Condition in Dumper Initialization

Location: extension.py:161-171

The audio dumper is started after the websocket connection begins, creating a window where audio could be sent before the dumper is ready:

# Line 148: WebSocket task started
self.ws_task = asyncio.create_task(ws.connect())

# Line 161-162: Dumper started later
if self.audio_dumper:
    await self.audio_dumper.start()

Impact: If the WebSocket connects quickly and audio starts flowing, push_bytes() will fail with "Dumper is not opened" error.

Recommendation: Start the dumper before creating the WebSocket connection task:

if self.audio_dumper:
    await self.audio_dumper.start()
self.ws_task = asyncio.create_task(ws.connect())

2. Bug: Missing await in Keepalive Loop

Location: websocket.py:171

await self._send_queue.put(keepalive_message)

The keepalive loop puts messages into the queue, but if the queue fills up, this could block indefinitely. While asyncio.Queue is unbounded by default, consider adding a timeout or using put_nowait() to avoid potential deadlocks.

3. Design Flaw: Keepalive Default Value

Location: config.py:24 & websocket.py:87

The PR description and commit messages indicate enable_keepalive should default to True (commit d9a4ff2), but:

  • Config model defaults to True: ✅
  • WebSocket client defaults to False: ❌

Inconsistency: If someone instantiates SonioxWebsocketClient directly (e.g., in tests), keepalive will be disabled by default.

Recommendation: Align the default in websocket.py:87 to True:

def __init__(
    self,
    url: str,
    start_request: str,
    base_delay: float = 0.1,
    max_delay: float = 10.0,
    max_attempts: int = 10,
    enable_keepalive: bool = True,  # Changed from False
    keepalive_interval: float = 15.0,
):

4. Logic Issue: No Reconnect Limit

Location: websocket.py:185-187

async def _exponential_backoff(self):
    if self._attempt_count >= self.max_attempts:
        await asyncio.sleep(self.max_delay)
        return

The code checks >= max_attempts but then continues retrying after sleeping. This doesn't actually limit reconnection attempts—it just caps the backoff delay. Per commit message (11be0e5: "no max reconnect attempt limit"), this appears intentional, but the logic is confusing.

Recommendation: Either:

  1. Document that max_attempts only limits backoff growth, not total attempts
  2. Rename to max_backoff_iterations for clarity
  3. Actually enforce a max attempts limit if that's desired

5. Potential Memory Leak: Uncancelled Task

Location: extension.py:148

self.ws_task = asyncio.create_task(ws.connect())

The task is stored but never explicitly cancelled or awaited during shutdown. If stop_connection() is called, the websocket stops but the task may not complete cleanly.

Recommendation: Add task cancellation in stop_connection():

async def stop_connection(self) -> None:
    self.ten_env.log_info("stop_connection")
    if self.audio_dumper:
        await self.audio_dumper.stop()
    if self.websocket:
        await self.websocket.stop()
    if self.ws_task and not self.ws_task.done():
        self.ws_task.cancel()
        try:
            await self.ws_task
        except asyncio.CancelledError:
            pass
    self.connected = False

🔒 Security Concerns

Low Risk: File System Access

The Dumper class writes to arbitrary paths specified in config. While this follows existing patterns in the framework, ensure:

  • Dump paths are validated to prevent directory traversal
  • File permissions are properly restricted in production

Info: Secrets in Filenames

Timestamp + random hex (6 chars) in rotated filenames provides ~16.7M combinations. This is sufficient for preventing filename collisions but not for security purposes.

⚡ Performance Considerations

Positive

  • Non-blocking I/O: Proper use of aiofiles for async file operations
  • Queue-based architecture: WebSocket send queue prevents blocking on network I/O
  • Lazy keepalive: Only sends when needed, minimal overhead

Concerns

  • File flush behavior: aiofiles may buffer writes. Consider adding periodic flush for real-time debugging scenarios
  • Lock contention: Every audio frame acquires a lock in push_bytes(). For high-throughput scenarios, consider lock-free alternatives or batching

🧪 Test Coverage

Excellent Improvements

  • ✅ Moved mock infrastructure to conftest.py as proper pytest fixtures
  • ✅ Created create_fake_websocket_mocks() and inject_websocket_mocks() helper functions
  • ✅ Tests updated to use new fixture patterns
  • ✅ All tests updated to handle before_send_callback parameter

Gaps

  • No tests for dump rotation: The new dump_rotate_on_finalize feature lacks dedicated tests
  • No tests for keepalive: WebSocket keepalive mechanism is untested
  • No tests for race conditions: The dumper initialization race condition is not covered

Recommendations

Add tests for:

  1. Dump file rotation creates files with timestamp suffix
  2. Multiple rotations create distinct files
  3. Keepalive messages sent after silence period
  4. Keepalive disabled when enable_keepalive=False
  5. Callback timing in finalize flow

📝 Code Style & Best Practices

Excellent

  • ✅ Follows TEN framework patterns (proper logging categories, error handling)
  • ✅ Type hints throughout (Python 3.10+ union syntax)
  • ✅ Pydantic for config validation
  • ✅ Proper docstrings for public methods

Minor Issues

  • Line 77 in dumper.py: _ = await self._file.write(data) - underscore assignment is unnecessary
  • Consider adding module-level docstrings to new files

🎯 Recommendations Summary

Must Fix (Critical)

  1. Fix dumper initialization race condition
  2. Align enable_keepalive defaults between config and websocket client
  3. Add proper task cancellation in stop_connection()

Should Fix (Important)

  1. Add tests for new dump rotation and keepalive features
  2. Clarify max_attempts behavior or rename parameter
  3. Handle potential keepalive queue blocking scenarios

Nice to Have

  1. Add periodic flush for dumper in debug mode
  2. Add module docstrings
  3. Document the callback timing contract for before_send_callback

📊 Overall Assessment

Quality: Good - Well-implemented features with proper async patterns
Completeness: 75% - Missing critical tests for new features
Security: No major concerns
Performance: Good - Efficient async design

This is a valuable addition to the framework, but requires fixes to the critical issues before merging. The test infrastructure improvements are excellent and will benefit future development.


Recommendation: Request changes to address critical issues (#1-3) and test coverage gaps before merging.

@claude
Copy link

claude bot commented Dec 30, 2025

Pull Request Review

This PR adds two new configuration options (dump_rotate_on_finalize and enable_keepalive) to the Soniox ASR extension and implements a custom Dumper class. Overall, the implementation is solid with good test coverage improvements. Here's my detailed feedback:


Strengths

1. Well-Structured Dumper Implementation

The new dumper.py file is well-designed with proper async/await patterns and thread-safety:

  • Lock-based synchronization prevents race conditions (asyncio.Lock)
  • rotate() method correctly opens new file before closing old one (atomic-like operation)
  • Good error handling that preserves old file handle on rotation failure

2. Improved Test Infrastructure

The refactoring of test fixtures (conftest.py) is excellent:

  • create_fake_websocket_mocks() and inject_websocket_mocks() provide clean, reusable test utilities
  • Tests are now more maintainable and less duplicative
  • Moving mocks from mock.py to conftest.py follows pytest best practices

3. Proper Feature Integration

  • dump_rotate_on_finalize is correctly integrated into the finalize callback chain
  • enable_keepalive parameter properly passed to websocket client
  • Config handling follows existing patterns with update() and to_str() methods

🔍 Potential Issues & Improvements

Critical: Missing Import Path Update

Location: extension.py:32

from .dumper import Dumper

Issue: The PR creates a custom Dumper class but changes the import from ten_ai_base.dumper to a local .dumper. However, I noticed that 18 other ASR extensions in the codebase use from ten_ai_base.dumper import Dumper. This suggests:

  1. Either ten_ai_base.dumper.Dumper doesn't support file rotation (requiring this custom implementation)
  2. Or this is duplicating functionality that should be added to the base class

Recommendation:

  • If the base Dumper doesn't support rotation, consider adding rotation support to ten_ai_base.dumper.Dumper instead of creating a custom version
  • If this custom Dumper is intentionally different, add a docstring explaining why Soniox needs its own implementation
  • Ensure consistency across the codebase - other extensions may also benefit from file rotation

Security: Error Message Information Disclosure

Location: dumper.py:75

raise RuntimeError(
    f"Dumper for {self._current_file_path()} is not opened. Please start the Dumper first."
)

Issue: The error message exposes the full file path, which could reveal sensitive directory structure information in logs.

Recommendation:

raise RuntimeError(
    "Dumper is not opened. Please start the Dumper first."
)

Similarly in dumper.py:69, the exception chaining already includes the error details, so the path isn't necessary in the outer message.


Bug: File Handle Leak on Rotation Failure

Location: dumper.py:45-69

async def rotate(self):
    async with self._lock:
        old_file = self._file
        try:
            # ...
            new_file = await aiofiles.open(...)
            if old_file:
                await old_file.close()
            self._file = new_file
        except Exception as e:
            self._file = old_file  # ⚠️ old_file is still open
            raise RuntimeError(...) from e

Issue: If rotation fails after opening the new file but before completing, the code correctly keeps old_file, but if it fails during directory creation or filename generation, the old file remains open indefinitely without being tracked.

Recommendation: Consider whether failed rotations should close the old file or keep it open. Current behavior is inconsistent - it depends on where the exception occurs. Add a comment documenting the intended behavior.


Performance: Unnecessary await Assignment

Location: dumper.py:77

_ = await self._file.write(data)

Minor: The assignment to _ is unnecessary. Just use await self._file.write(data).


Race Condition: Websocket Task Not Awaited

Location: extension.py:150

self.ws_task = asyncio.create_task(ws.connect())

Issue: The task is created but never awaited in stop_connection(). While websocket.stop() is called, the task itself should be properly cancelled/awaited to prevent warnings and ensure clean shutdown.

Recommendation:
In stop_connection():

async def stop_connection(self) -> None:
    self.ten_env.log_info("stop_connection")
    if self.audio_dumper:
        await self.audio_dumper.stop()
    if self.websocket:
        await self.websocket.stop()
    if self.ws_task and not self.ws_task.done():
        self.ws_task.cancel()
        try:
            await self.ws_task
        except asyncio.CancelledError:
            pass
    self.connected = False

Code Quality: Type Hint Inconsistency

Location: websocket.py:296

before_send_callback: Optional[Callable] = None

Issue: The Callable type hint is incomplete. It should specify the signature.

Recommendation:

before_send_callback: Optional[Callable[[int], Awaitable[None]]] = None

This matches the actual usage in extension.py:264 where the callback takes audio_bytes_sent: int.


Test Coverage: Missing Rotation Test

Observation: The tests in test_finalize.py now handle the before_send_callback parameter, but there's no explicit test verifying that:

  1. File rotation actually occurs when dump_rotate_on_finalize=True
  2. Multiple dump files are created with unique names
  3. The callback is invoked at the correct time with the correct byte count

Recommendation: Add a test case similar to test_audio_dump_functionality but with dump_rotate_on_finalize: True that:

  • Sends audio data
  • Triggers finalize
  • Verifies multiple dump files exist with timestamp suffixes
  • Validates file contents

Documentation: Config Options Need Description

Location: config.py:20, 24

The new config options lack inline documentation:

dump_rotate_on_finalize: bool = False
enable_keepalive: bool = True

Recommendation: Add docstrings or inline comments:

dump_rotate_on_finalize: bool = False  # Rotate dump file on each finalize command
enable_keepalive: bool = True  # Send keepalive messages during silence periods

🔒 Security Assessment

Overall: Low Risk

  • No SQL injection, XSS, or command injection vulnerabilities
  • File operations use os.makedirs with exist_ok=True (safe)
  • secrets.token_hex(3) provides sufficient randomness for filename uniqueness
  • API key handling follows existing patterns with encryption in logs

⚠️ Minor Concerns:

  • Path disclosure in error messages (noted above)
  • No validation on dump_path - could write to arbitrary directories if misconfigured

Performance Considerations

Efficient Implementation:

  • Async file I/O with aiofiles prevents blocking
  • Keepalive interval (15s) is reasonable and configurable
  • Lock contention should be minimal for typical ASR workloads

⚠️ Watch For:

  • High-frequency rotation could cause I/O spikes - consider rate limiting if needed
  • time.time() * 1000 in rotation could collide if called twice in same millisecond (mitigated by secrets.token_hex, but worth noting)

📝 Minor Suggestions

  1. Version Bump: The manifest.json version was bumped from 0.3.5 → 0.3.6. Consider if this should be a minor version bump (0.4.0) since new features were added.

  2. Whitespace: manifest.json:32 adds a newline at EOF - ensure this is intentional and consistent with repo style.

  3. Import Order: In conftest.py, consider grouping standard library, third-party, and local imports with blank lines per PEP 8.


🎯 Summary

Approval Status: ⚠️ Approve with Minor Changes Requested

This PR is well-implemented with good attention to async patterns and test quality. The main concerns are:

  1. Critical: Clarify the relationship between custom Dumper and ten_ai_base.dumper.Dumper
  2. Important: Fix websocket task lifecycle management
  3. Nice-to-have: Add explicit file rotation test case

The code is production-ready after addressing the websocket task cleanup. Great work on the test refactoring! 🚀


Reviewed with: Claude Code
Focus Areas: Async patterns, resource management, test coverage, security

@wangyoucao577 wangyoucao577 merged commit 4895530 into main Dec 30, 2025
34 checks passed
@wangyoucao577 wangyoucao577 deleted the feat-soniox_rotate_dump branch December 30, 2025 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants