Skip to content

Conversation

@anexpn
Copy link
Contributor

@anexpn anexpn commented Dec 7, 2025

Note

Introduces the pytest_ten system package for pytest-based AAA-style testing and wires it into build, packaging/publishing, CI, versioning scripts, and Python integration tests.

  • New system package: packages/core_systems/pytest_ten
    • Provides AAA-style testing utilities: TenTestContext, ten_test decorator.
    • Adds build rules (BUILD.gn), manifest, docs, pyproject.toml, uv.lock, and publish target.
  • Build/Repo config
    • Include //packages/core_systems in ten_framework_all.
    • Add pytest_ten to editor/type-checker paths (.vscode/settings.json, pyrightconfig.json).
  • CI/CD (Ubuntu 22.04 workflow)
    • Package/publish ten_packages/system/pytest_ten alongside other system packages.
    • Update test artifacts and PYTHONPATH to load pytest_ten.
  • Versioning tools
    • Support updating dev_dependencies in manifests.
    • Add pytest_ten to version update pipeline.
  • Python binding
    • Refactor async callbacks in AsyncTenEnvTester to named functions for queue handling.
  • Tests
    • Add/convert Python integration tests to use pytest_ten; add dev_dependencies on pytest_ten and adjust start scripts to include it.

Written by Cursor Bugbot for commit 9884229. This will update automatically on new commits. Configure here.

@anexpn anexpn requested a review from halajohn as a code owner December 7, 2025 16:39
@anexpn anexpn changed the title feat: add a python system package pytest_ten supporting AAA-style tests feat: add a python system package pytest_ten supporting AAA-style tests Dec 7, 2025
@claude
Copy link

claude bot commented Dec 7, 2025

Pull Request Review

This PR adds a new Python system package pytest_ten that provides AAA-style (Arrange-Act-Assert) testing support for TEN extensions. Overall, this is a well-crafted addition with clean design and good test coverage. Here's my detailed feedback:


✅ Strengths

Code Quality

  • Excellent API design: The TenTestContext class provides a clean, intuitive interface for testing
  • Comprehensive functionality: Supports all message types (Cmd, Data, AudioFrame, VideoFrame)
  • Good documentation: Extensive docstrings and examples in the ten_test decorator
  • Clean separation: _TenTestRunner is properly marked as internal with leading underscore
  • Type hints: Good use of type annotations throughout

Testing

  • Good test coverage: Three example tests demonstrate different use cases (basic, mocking, fixtures)
  • Practical examples: The test files show real-world usage patterns

⚠️ Issues & Recommendations

🔴 High Priority

1. Race Condition in Message Filtering (packages/core_system/pytest_ten/pytest_ten/ten_test.py:74-98)

The expect_data and expect_cmd methods have a potential race condition when using put_back=True:

if name and data.get_name() \!= name:
    if put_back:
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # Race: another expect_ may grab it
    continue

Problem: Between put and the next get, another concurrent expect_* call could consume the message.

Recommendation: Consider using a deque or implementing a proper message queue with filtering support instead of the put-back pattern.

2. Missing Resource Cleanup (packages/core_system/pytest_ten/pytest_ten/ten_test.py:380-491)

The _TenTestRunner class stores queues and tasks but doesn't provide cleanup for unconsumed messages.

Recommendation: Add cleanup in on_deinit or after test completion:

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass
    
    # Clear queues
    if self._ctx:
        for queue in [self._ctx.received_cmds, self._ctx.received_data, 
                      self._ctx.received_audio, self._ctx.received_video]:
            while not queue.empty():
                try:
                    queue.get_nowait()
                except asyncio.QueueEmpty:
                    break

3. Incomplete README Documentation (packages/core_system/pytest_ten/docs/README.en-US.md)

The README is a template placeholder with no actual content about pytest_ten.

Recommendation: Replace with actual documentation covering:

  • What pytest_ten is and why to use it
  • Installation instructions
  • Quick start example
  • API reference link
  • Migration guide from old test style

🟡 Medium Priority

4. Hardcoded Timeout Values (packages/core_system/pytest_ten/pytest_ten/ten_test.py:84, 132)

Using min(remaining, 0.1) hardcodes the polling interval.

Recommendation: Make this configurable or document why 0.1s was chosen.

5. Direct Access to Private Member (packages/core_system/pytest_ten/pytest_ten/ten_test.py:604)

tester._pytest_fixtures = kwargs

Recommendation: While this works, consider making it a public method:

def set_pytest_fixtures(self, fixtures: dict[str, Any]) -> None:
    self._pytest_fixtures = fixtures

6. Potential Memory Issue with Clone (packages/core_system/pytest_ten/pytest_ten/ten_test.py:449, 471)

Every message is cloned before queuing, which could be expensive for large frames.

Recommendation: Document memory implications or consider shallow copies where appropriate.

7. Mock Response Priority Not Documented (packages/core_system/pytest_ten/pytest_ten/ten_test.py:452-466)

Stream mocks take precedence over regular mocks, but this isn't documented.

Recommendation: Add docstring clarification:

# Note: mock_cmd_stream_response takes precedence over mock_cmd_response
# if both are provided for the same command name

🟢 Low Priority

8. Inconsistent Error Messages (packages/core_system/pytest_ten/pytest_ten/ten_test.py)

Some methods use f-strings, others use format strings for consistency.

Recommendation: Standardize on f-strings throughout.

9. Missing Type Hints for sync wrapper (packages/core_system/pytest_ten/pytest_ten/ten_test.py:602)

The wrapper function lacks return type annotation.

Recommendation: Add -> None annotation.

10. Deprecated API Usage (packages/core_system/pytest_ten/pytest_ten/ten_test.py:74, 122, 200, etc.)

Using asyncio.get_event_loop().time() - should use asyncio.get_running_loop().time() in async context.

Recommendation: Update to modern asyncio API:

end_time = asyncio.get_running_loop().time() + timeout

🔒 Security

No security concerns identified. The code properly handles:

  • Error messages with traceback (good for debugging)
  • No unsafe eval/exec usage
  • Proper async exception handling

⚡ Performance

Considerations:

  1. Message cloning overhead: Every message is cloned, which may impact high-throughput tests
  2. Polling pattern: The 0.1s polling interval in expect_* methods could add latency
  3. Queue growth: No bounds on queue sizes could lead to memory issues in long-running tests

Recommendation: Consider adding:

  • Optional message cloning (clone_messages=True parameter)
  • Bounded queue sizes with overflow handling
  • Performance notes in documentation

📝 Documentation & Conventions

Missing Documentation:

  1. ✅ Good: Extensive docstrings in code
  2. ❌ Missing: Actual README content (currently placeholder)
  3. ❌ Missing: Migration guide from old test style
  4. ⚠️ Incomplete: Chinese README also needs content

Code Conventions:

  • ✅ Follows PEP 8
  • ✅ Good naming conventions
  • ✅ Proper use of private/public members
  • ⚠️ Minor: Some inconsistent string formatting

🧪 Test Coverage

Current Coverage:

  • ✅ Basic test flow (test_recv_msg_during_starting.py)
  • ✅ Mock responses (test_mock.py)
  • ✅ Callback handling (test_callback_after_stop_test.py)

Missing Coverage:

  • put_back=True parameter usage
  • ❌ Timeout edge cases
  • collect_all_* methods
  • assert_no_* methods
  • ❌ Audio/Video frame testing
  • ❌ Mock data response functionality
  • ❌ Stream mock responses
  • ❌ Error conditions (malformed messages, etc.)

Recommendation: Add tests for the untested code paths.


🏗️ Build System

BUILD.gn:

  • ✅ Proper package configuration
  • ✅ Correct dependency specification
  • ✅ Docs file globbing looks correct
  • ⚠️ Consider adding a comment explaining the glob script usage

pyproject.toml:

  • ✅ Proper setuptools configuration
  • ✅ Correct dependency specification (pytest>=8.2,<10)
  • ✅ Python version requirement (>=3.10)

uv.lock:

  • ⚠️ Lock file included but truncated in diff - verify it's complete
  • ✅ Standard dependencies visible (colorama, pytest)

📊 Summary

Category Rating Notes
Code Quality ⭐⭐⭐⭐ Well-structured, clean design
Documentation ⭐⭐ Good inline docs, missing README
Testing ⭐⭐⭐ Good examples, missing edge cases
Security ⭐⭐⭐⭐⭐ No concerns
Performance ⭐⭐⭐ Adequate, some optimization opportunities
Overall ⭐⭐⭐⭐ Recommended with minor revisions

🎯 Action Items

Before Merge:

  1. Fix the README files with actual content (high priority)
  2. Document the race condition in put_back parameter or remove it
  3. Add resource cleanup in on_deinit

Future Enhancements:

  1. Add comprehensive test coverage for all code paths
  2. Consider performance optimizations for high-throughput scenarios
  3. Add migration guide from old test patterns
  4. Update to modern asyncio APIs

Great work on this feature! The core implementation is solid and provides valuable testing capabilities. The main concerns are around documentation completeness and a few edge cases in the filtering logic. Once the README is updated and cleanup is added, this will be ready to merge.

@claude
Copy link

claude bot commented Dec 8, 2025

Code Review: pytest_ten System Package

Thank you for this comprehensive addition of pytest testing infrastructure! This is a valuable enhancement that provides AAA-style testing for TEN extensions. Here's my detailed review:


✅ Strengths

  1. Excellent API Design: The TenTestContext API provides a clean, intuitive interface for extension testing with proper async/await patterns
  2. Comprehensive Testing Helpers: Good coverage of assertion methods (assert_no_data, assert_no_cmd, etc.) and collection methods
  3. Strong Documentation: Extensive docstrings and examples in the decorator show excellent attention to developer experience
  4. Good Integration: The pytest fixture integration is well-designed, allowing seamless use of pytest fixtures alongside TEN test context
  5. Proper Error Handling: Error categorization (AssertionError, TimeoutError, generic errors) maps well to TenErrorCode

🐛 Potential Issues

1. asyncio.get_event_loop() Deprecation (Medium Priority)

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:74, 122, 200, etc.

# Current code (deprecated in Python 3.10+)
end_time = asyncio.get_event_loop().time() + timeout

Issue: asyncio.get_event_loop() is deprecated when not running in an event loop. In Python 3.12+, this will raise warnings or errors.

Recommendation: Use asyncio.get_running_loop() instead:

end_time = asyncio.get_running_loop().time() + timeout

This pattern appears in:

  • expect_data(): lines 74, 77, 84
  • expect_cmd(): lines 122, 125, 132
  • collect_all_data(): lines 200, 203
  • collect_all_cmds(): lines 229, 232
  • collect_all_audio_frames(): lines 254, 257
  • collect_all_video_frames(): lines 279, 282

2. Resource Cleanup in _TenTestRunner.on_deinit() (Low Priority)

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:414-420

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass

Issue: Queues in TenTestContext are not explicitly cleaned up, which could lead to memory retention if tests fail.

Recommendation: Consider adding cleanup:

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass
    
    # Clear queues to prevent memory leaks
    if self._ctx:
        while not self._ctx.received_cmds.empty():
            self._ctx.received_cmds.get_nowait()
        while not self._ctx.received_data.empty():
            self._ctx.received_data.get_nowait()
        # ... clear other queues

3. Potential Race Condition in put_back Logic (Low Priority)

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:89-92, 137-140

if put_back:
    await self.received_data.put(data)
    await asyncio.sleep(0.05)  # Small delay to avoid busy loop

Issue: The put_back logic could theoretically starve if multiple consumers are racing. The 0.05s sleep is a heuristic that might not work in all scenarios.

Note: This is likely acceptable for testing scenarios, but worth documenting the limitation.


🔒 Security Considerations

No security issues identified. The code operates in a test context and doesn't handle user input or network data directly.


⚡ Performance Considerations

1. Timeout Polling Overhead

The expect_* methods use 0.1s polling intervals:

attempt_timeout = min(remaining, 0.1)

Assessment: This is reasonable for tests, but adds 100ms latency in the worst case. For high-frequency testing, this could add up.

Recommendation: Consider making the poll interval configurable, or use an event-based approach with asyncio.Event for zero-latency wakeups.

2. Message Cloning

await self._ctx.received_cmds.put(cmd.clone())
await self._ctx.received_data.put(data.clone())

Assessment: Good defensive programming to prevent mutation issues. The performance cost is acceptable for testing.


📝 Code Quality

1. Inconsistent README Placeholders

Location: packages/core_system/pytest_ten/docs/README.en-US.md

The README files contain placeholder content:

# Default Extension Python Extension

## Features
- xxx features

Recommendation: Replace with actual pytest_ten documentation:

# pytest_ten - TEN Framework Pytest Integration

`pytest_ten` provides AAA (Arrange-Act-Assert) style testing helpers for TEN Framework extensions.

## Features
- Clean `@ten_test` decorator for pytest integration
- Context-based API for sending/receiving messages
- Built-in timeout and assertion helpers
- Support for pytest fixtures
- Mock command and data responses

2. Type Hints Consistency

The code uses modern Python 3.10+ type hints (str | None, dict[str, Any]) which is great! Ensure requires-python = ">=3.10" is enforced (already done in pyproject.toml ✅).


🧪 Test Coverage

Example Tests Look Good

The integration tests demonstrate proper usage:

  • test_mock_cmd_result_with_pytest_ten: Shows mock command responses
  • test_recv_msg_during_starting: Shows basic message flow
  • test_basic_echo: Shows async command handling

Suggestion: Consider adding a test for the put_back parameter to validate the message filtering behavior.


📦 Build Configuration

BUILD.gn Issues

Properly configured - The package correctly includes all necessary files and sets up dependencies.

Minor Note: The uv.lock file is checked in (147 lines). Consider adding a comment in the BUILD.gn or docs about when/how to update it.


🎯 Recommendations Summary

High Priority

  1. Fix asyncio.get_event_loop() deprecation - Replace with asyncio.get_running_loop()

Medium Priority

  1. 📝 Update README documentation - Replace placeholder content with actual pytest_ten docs
  2. 🧹 Add resource cleanup in on_deinit()

Low Priority

  1. 📖 Document put_back limitations - Add note about race conditions in docstrings
  2. 🧪 Add test for put_back - Ensure the message filtering works as expected
  3. 🔧 Make poll interval configurable - Allow tuning for different test scenarios

✨ Overall Assessment

This is a well-designed and valuable addition to the TEN Framework! The API is intuitive, the documentation is thorough, and the integration with pytest is seamless. The main issue is the asyncio.get_event_loop() deprecation, which should be addressed before merge.

Recommendation: Approve with minor changes (fix the event loop deprecation)

Great work! 🎉

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

) -> None:
assert self._ctx is not None

await self._ctx.received_video.put(video_frame)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Audio and video frames not cloned before queueing

In on_audio_frame and on_video_frame, the frames are put directly into the queues without calling .clone(), while on_cmd and on_data correctly clone their messages before queueing. This inconsistency could cause issues if the runtime modifies or deallocates the original frame objects after the callback returns, leading to accessing invalid or corrupted data when the test consumes from the queue.

Fix in Cursor Fix in Web

@claude
Copy link

claude bot commented Dec 8, 2025

PR Review: pytest_ten System Package

Overall Assessment

This is a well-structured and valuable addition to the TEN Framework. The AAA-style testing utilities significantly improve the developer experience for writing Python extension tests. The implementation demonstrates good software engineering practices with comprehensive error handling, clear documentation, and practical examples.

✅ Strengths

1. Code Quality & Architecture

  • Clean separation of concerns: TenTestContext handles message collection/expectations while _TenTestRunner manages test lifecycle
  • Comprehensive API: Provides expect/collect/assert methods for all message types (cmd, data, audio_frame, video_frame)
  • Excellent documentation: Extensive docstrings with clear examples showing basic usage, pytest fixtures, and mocks
  • Proper async/await patterns: Consistent use of async throughout, avoiding common async pitfalls

2. Testing Features

  • Flexible timeout handling: Per-operation timeouts with sensible defaults
  • Smart message filtering: put_back parameter allows non-destructive filtering when order matters
  • Mock support: Well-designed mock system for commands (including streaming), data, supporting complex test scenarios
  • Pytest integration: Seamless fixture support via signature inspection (lines 597-611)

3. Error Handling

  • Proper exception mapping: AssertionError → ErrorCodeGeneric, TimeoutError → ErrorCodeTimeout (lines 430-443)
  • Full stack traces: Includes traceback in error messages for debugging (lines 431, 436, 442)
  • Graceful cleanup: Task cancellation in on_deinit prevents resource leaks (lines 414-420)

⚠️ Issues Found

1. CRITICAL BUG: Potential Race Condition in Message Cloning

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:449, 471

async def on_cmd(self, ten_env: AsyncTenEnvTester, cmd: Cmd) -> None:
    await self._ctx.received_cmds.put(cmd.clone())  # Line 449
    cmd_name = cmd.get_name()
    # ... uses original cmd object for mocking

Problem: If .clone() performs a shallow copy and the original cmd object is mutated after queueing, tests could observe inconsistent state. Similarly for data.clone() at line 471.

Recommendation: Verify that clone() performs a deep copy. If it doesn't, the original object should not be used after cloning.


2. MEDIUM: Busy-Wait Pattern in Message Filtering

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:84-98, 132-145

attempt_timeout = min(remaining, 0.1)
data = await asyncio.wait_for(self.received_data.get(), timeout=attempt_timeout)
if name and data.get_name() != name:
    if put_back:
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # Busy loop mitigation
    continue

Problem: When put_back=True, this creates a busy-wait pattern with 100ms sleep + 50ms delay per iteration. For deeply nested queues, this could waste CPU cycles and extend test times.

Recommendation: Consider a more efficient approach:

  • Use asyncio.Queue.qsize() to check if retrying is worthwhile
  • Implement a priority queue or separate queues per message name
  • Document the performance implications of put_back=True

3. MEDIUM: Incomplete Documentation

Location: packages/core_system/pytest_ten/docs/README.en-US.md

Problem: The README contains only placeholder text:

# Default Extension Python Extension
<!-- Brief introduction for the extension -->
## Features
- xxx features

Recommendation: Replace with actual documentation covering:

  • What pytest_ten is and why developers should use it
  • Installation instructions (how to add as dev dependency)
  • Complete usage examples
  • API reference for TenTestContext and ten_test decorator
  • Migration guide from legacy test patterns

4. LOW: Missing Type Hints for Mock Callbacks

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:392-396

mock_cmd_response: dict[str, Callable[[Cmd], CmdResult]] | None = None,
mock_cmd_stream_response: (
    dict[str, Callable[[Cmd], AsyncGenerator[CmdResult, None]]] | None
) = None,
mock_data_response: dict[str, Callable[[Data], Data]] | None = None,

Problem: While types are provided, there's no runtime validation that mock functions return the correct types.

Recommendation: Add runtime validation in on_cmd/on_data to check isinstance and provide clear error messages if mocks return wrong types.


5. LOW: Hardcoded Event Loop Access

Location: Multiple locations (e.g., line 74, 122, 200)

end_time = asyncio.get_event_loop().time() + timeout

Problem: get_event_loop() is deprecated in Python 3.10+ and can raise exceptions if called outside async context.

Recommendation: Use asyncio.get_running_loop() instead (requires async context, which is guaranteed here).


6. MINOR: .vscode/settings.json Change Concerns

Location: .vscode/settings.json

Problem: The removal of AI agents paths could break type checking for developers working on ai_agents:

-    "./ai_agents/agents/ten_packages/system/ten_ai_base/interface",
-    "./ai_agents/agents/ten_packages/system/ten_ai_base/lib"
+    "./packages/core_system/pytest_ten"

Recommendation: Verify this is intentional. If ai_agents have their own workspace settings, this is fine. Otherwise, consider adding pytest_ten without removing existing paths.

🔍 Security Considerations

No security concerns identified

  • No user input is directly executed
  • Proper timeout limits prevent DoS
  • Error messages don't leak sensitive information

⚡ Performance Considerations

Good:

  • Async/await throughout prevents blocking
  • Reasonable default timeout (5s)
  • Message cloning prevents shared state issues

Watch for:

  • put_back=True can cause O(n²) behavior for large message queues with filtering
  • Unbounded queue growth if extension sends messages faster than test consumes them (consider adding maxsize warning)

📊 Test Coverage

Good test examples provided:

  • Basic AAA-style tests (test_recv_msg_during_starting.py)
  • Mock usage (test_mock.py)
  • Lifecycle testing (test_callback_after_stop_test.py)

Suggestions:

  • Add test for put_back=True edge cases
  • Add test for timeout edge cases (exactly at timeout, just before/after)
  • Add test for queue overflow scenarios

🔧 Code Style & Conventions

✅ Follows repository conventions:

  • Proper copyright headers
  • Consistent naming (snake_case)
  • Clear variable names
  • Comprehensive docstrings

📝 Recommended Actions

Must Fix (Before Merge):

  1. Verify cmd.clone() and data.clone() perform deep copies or document shallow copy implications
  2. Complete the README documentation with real content

Should Fix (High Priority):

  1. Address busy-wait pattern or document performance implications
  2. Replace get_event_loop() with get_running_loop()

Consider (Lower Priority):

  1. Add runtime type validation for mock callbacks
  2. Verify .vscode/settings.json changes don't break ai_agents development
  3. Add edge case tests mentioned above

🎯 Summary

This is a high-quality contribution that will significantly improve the testing experience for TEN Framework Python extensions. The core implementation is solid, with only a few issues to address:

  • Critical: 1 (race condition verification needed)
  • Medium: 2 (busy-wait pattern, incomplete docs)
  • Low/Minor: 3 (type validation, event loop, vscode settings)

Recommendation: ✅ Approve with requested changes - Address the critical race condition verification and complete the documentation before merging.


Great work on this feature! The AAA-style testing will make tests much more readable and maintainable. 🎉

@claude
Copy link

claude bot commented Dec 8, 2025

Pull Request Review: Add pytest_ten System Package

Overview

This PR introduces pytest_ten, a new system package that provides AAA-style (Arrange-Act-Assert) pytest testing utilities for TEN Framework extensions. The implementation is well-structured and follows good testing patterns.


✅ Strengths

1. Clean Architecture

  • Well-designed TenTestContext class that provides a clean API for testing
  • Good separation of concerns between TenTestContext (user-facing) and _TenTestRunner (internal)
  • Excellent decorator pattern for @ten_test that integrates seamlessly with pytest

2. Comprehensive API

  • Full coverage of message types: commands, data, audio frames, video frames
  • Flexible timeout handling with context defaults and per-call overrides
  • Useful utility methods like collect_all_* and assert_no_*
  • Mock support for cmd responses, streaming responses, and data responses

3. Good Error Handling

  • Proper error categorization (AssertionError, TimeoutError, generic exceptions)
  • Clear error messages with context
  • Appropriate use of TenError types (ErrorCodeGeneric, ErrorCodeTimeout)

4. Integration & CI

  • Properly integrated into GN build system
  • Added to CI/CD workflows for packaging and publishing
  • Version tooling updated to handle new package
  • IDE configurations updated (VSCode, Pyright)

🐛 Potential Issues & Improvements

Critical Issues

1. Race Condition in Message Filtering (ten_test.py:88-94, 136-142)

The put_back functionality has a potential race condition:

if name and data.get_name() != name:
    if put_back:
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # Small delay to avoid busy loop
    continue

Issue: When multiple tasks are consuming from the same queue, a message put back might be consumed by a different waiter before the original waiter retries.

Recommendation: Consider using a more robust pattern:

  • Use a deque to reinsert at the front of the queue
  • Or use multiple queues filtered by name
  • Document the limitation clearly if keeping current approach

2. Event Loop Access Pattern (ten_test.py:74, 122, etc.)

Using asyncio.get_event_loop() is deprecated in modern Python:

end_time = asyncio.get_event_loop().time() + timeout
remaining = end_time - asyncio.get_event_loop().time()

Recommendation: Use asyncio.get_running_loop() instead:

loop = asyncio.get_running_loop()
end_time = loop.time() + timeout
remaining = end_time - loop.time()

This is more explicit and avoids deprecated API usage.

Design Concerns

3. Message Cloning Strategy

Messages are cloned before queuing:

await self._ctx.received_cmds.put(cmd.clone())  # Line 449
await self._ctx.received_data.put(data.clone())  # Line 471

But audio/video frames are not:

await self._ctx.received_audio.put(audio_frame)  # Line 484
await self._ctx.received_video.put(video_frame)  # Line 491

Question: Is this intentional? If frames are mutable, this could lead to issues. Consider documenting why frames aren't cloned or clone them for consistency.

4. Unbounded Queue Growth

All message queues are unbounded:

self.received_cmds: asyncio.Queue[Cmd] = asyncio.Queue()

Risk: If an extension sends many messages that aren't consumed, memory could grow unbounded.

Recommendation: Consider either:

  • Adding a reasonable maxsize (e.g., 1000) with overflow handling
  • Document that users must consume all messages
  • Add a warning when queue size exceeds threshold

🔍 Code Quality

Good Practices ✅

  • Excellent docstrings with clear parameter descriptions
  • Type hints throughout
  • Consistent naming conventions
  • Good use of async/await patterns
  • Proper resource cleanup in on_deinit

Minor Issues

5. Unused Import (ten_test.py:3)

import inspect  # Only used in decorator, could be moved to function scope if desired

Minor: Not a real issue, but inspect is only used in the decorator function at the bottom.

6. Magic Numbers

attempt_timeout = min(remaining, 0.1)  # Line 84, 132
await asyncio.sleep(0.05)  # Line 92, 140

Recommendation: Extract as named constants:

_POLL_INTERVAL = 0.1
_PUT_BACK_DELAY = 0.05

7. Documentation Gaps

The README files (packages/core_system/pytest_ten/docs/README.*.md) are placeholder templates and don't document the actual package. This should be updated before release.


🔒 Security

No security concerns identified.

  • No user input processing without validation
  • No file system operations with user-controlled paths
  • No credential handling
  • Appropriate sandboxing in test environment

⚡ Performance

Concerns

8. Busy-Wait Pattern in expect_ Methods*

The implementation polls every 0.1 seconds:

while True:
    remaining = end_time - asyncio.get_event_loop().time()
    if remaining <= 0:
        raise TimeoutError(...)
    try:
        attempt_timeout = min(remaining, 0.1)
        data = await asyncio.wait_for(self.received_data.get(), timeout=attempt_timeout)

Observation: This creates unnecessary wake-ups every 100ms. While not critical for tests, it's not ideal.

Alternative: Could use a single asyncio.wait_for with the full timeout:

try:
    data = await asyncio.wait_for(self.received_data.get(), timeout=timeout)
    if name and data.get_name() != name:
        # handle mismatch
except asyncio.TimeoutError:
    raise TimeoutError(...)

But current approach allows for name filtering, so trade-off may be acceptable for tests.


🧪 Test Coverage

Coverage is good:

  • Example tests in test_mock.py demonstrate mocking
  • test_recv_msg_during_starting.py shows basic usage
  • Tests compare old ExtensionTester approach with new pytest_ten approach
  • Both sync and async patterns tested

Suggestion: Consider adding tests for:

  • Timeout scenarios
  • put_back=True behavior
  • collect_all_* methods
  • Negative assertions (assert_no_*)
  • Stream mock responses

📋 Build System & CI

Well integrated:

  • ✅ GN build files properly configured
  • ✅ Added to workflow packaging arrays
  • ✅ Publish target created
  • ✅ Version tooling updated for manifest.json and dev_dependencies
  • ✅ PYTHONPATH configurations updated

One observation:
The version tooling change in tools/version/common.py:172-177 adds support for dev_dependencies, which is good and necessary for this PR.


📝 Recommendations Summary

Must Fix Before Merge

  1. Replace asyncio.get_event_loop() with asyncio.get_running_loop() (Python 3.10+ best practice)
  2. Document or fix the audio/video frame cloning inconsistency
  3. Update README documentation to replace placeholder content

Should Consider

  1. Document race condition risk with put_back=True or improve implementation
  2. Add queue size limits or document unbounded growth risk
  3. Extract magic numbers to named constants
  4. Consider performance optimization for polling in expect_* methods (lower priority for test code)

Nice to Have

  1. Add more comprehensive test coverage for edge cases
  2. Add examples in documentation showing common test patterns

🎯 Overall Assessment

This is a high-quality PR that adds valuable testing infrastructure to the TEN Framework.

The implementation is clean, well-documented, and provides an excellent developer experience. The few issues identified are minor and don't block the core functionality.

Recommendation: Approve with minor changes (address the deprecated API usage and documentation at minimum)

Great work! 🚀

@claude
Copy link

claude bot commented Dec 8, 2025

Code Review: pytest_ten System Package

Summary

This PR introduces a well-designed pytest integration package (pytest_ten) that provides AAA-style testing utilities for TEN extensions. The implementation is comprehensive, covering the new package code, build integration, CI/CD updates, versioning tooling, and example test migrations. Overall, this is a high-quality contribution with solid architecture and good practices.


✅ Strengths

1. Excellent API Design

  • The TenTestContext class provides a clean, intuitive API for test authors
  • AAA-style testing (Arrange-Act-Assert) is a great pattern for readability
  • Comprehensive message handling: commands, data, audio/video frames
  • Well-thought-out filtering and collection methods (expect_*, collect_all_*, assert_no_*)

2. Strong Documentation

  • Extensive docstrings with clear parameter descriptions and examples
  • Multiple usage examples in the decorator docstring (basic, fixtures, mocks)
  • Good explanation of the pytest fixture integration pattern

3. Robust Error Handling

  • Proper exception chaining with from e syntax
  • Distinct error codes for different failure types (assertion, timeout, generic)
  • Comprehensive traceback inclusion in error messages

4. Complete Integration

  • Proper GN build system integration with package publishing
  • CI/CD pipeline updates for packaging and testing
  • Version tooling correctly updated to handle the new package
  • Editor configuration updates (VSCode, Pyright)

5. Good Testing Patterns

  • Mock support for commands (both single-response and streaming)
  • Mock support for data responses
  • Pytest fixture support through signature inspection
  • Real test examples demonstrate usage

🔍 Issues Found

Critical Issues

1. Race Condition in Message Filtering (packages/core_system/pytest_ten/pytest_ten/ten_test.py:89-92, 137-140)

The put_back=True logic has a potential race condition:

if put_back:
    # Put back and wait before retrying
    await self.received_data.put(data)
    await asyncio.sleep(0.05)  # Small delay to avoid busy loop

Problem: If messages arrive while waiting, they could be reordered. The queue doesn't preserve order when putting items back.

Impact: Tests using put_back=True could see messages in unexpected order, causing flakiness.

Recommendation:

  • Either document this limitation clearly in the docstring
  • Or implement a priority queue that preserves insertion order
  • Or remove put_back parameter and only support drop behavior (simpler, clearer semantics)

2. Deprecated API Usage (packages/core_system/pytest_ten/pytest_ten/ten_test.py:74, 122, 200, etc.)

Multiple uses of asyncio.get_event_loop().time():

end_time = asyncio.get_event_loop().time() + timeout

Problem: get_event_loop() is soft-deprecated in Python 3.10+ and may not work correctly in all contexts.

Impact: Could cause warnings or failures in future Python versions.

Recommendation: Replace with asyncio.get_running_loop().time() (safe in async context):

end_time = asyncio.get_running_loop().time() + timeout

High Priority Issues

3. Missing Clone on Audio/Video Frames (packages/core_system/pytest_ten/pytest_ten/ten_test.py:484, 491)

await self._ctx.received_audio.put(audio_frame)  # Not cloned\!
await self._ctx.received_video.put(video_frame)  # Not cloned\!

Problem: Commands and data are cloned (line 449, 471), but audio/video frames are not. This could cause memory issues if frames are mutated.

Recommendation: Either clone frames or document why they shouldn't be cloned:

await self._ctx.received_audio.put(audio_frame.clone())  # If clone() exists

4. Unused Import (packages/core_system/pytest_ten/pytest_ten/ten_test.py:3)

import inspect
import traceback

Problem: inspect is imported but only used at line 598, while traceback is used. Both are needed but the order suggests potential cleanup.

Recommendation: Verify both are needed, remove if not.


Medium Priority Issues

5. Task Cancellation Without Cleanup (packages/core_system/pytest_ten/pytest_ten/ten_test.py:414-420)

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass

Problem: If the test is cancelled mid-operation (e.g., during send_cmd), resources might not be properly cleaned up.

Recommendation: Consider adding cleanup logic or documenting expected behavior on cancellation.


6. Hardcoded Timeout Values (packages/core_system/pytest_ten/pytest_ten/ten_test.py:84)

attempt_timeout = min(remaining, 0.1)
await asyncio.sleep(0.05)

Problem: Magic numbers make the code harder to tune for different environments.

Recommendation: Extract as class constants:

_QUEUE_POLL_INTERVAL = 0.1  # seconds
_PUT_BACK_DELAY = 0.05  # seconds

7. Version Mismatch Risk (packages/core_system/pytest_ten/pyproject.toml:8 vs manifest.json:4)

  • pyproject.toml version: 0.1.0
  • manifest.json version: 0.11.41

Problem: These versions don't match, which could cause confusion about the package version.

Recommendation: Align versions or document why they differ. Typically, manifest.json should be the source of truth for TEN packages.


8. Missing Type Hints for Queues (packages/core_system/pytest_ten/pytest_ten/ten_test.py:31-34)

The generic types on Queue declarations are good, but Python 3.10+ should use queue.Queue[T] from the standard library.

Current:

self.received_cmds: asyncio.Queue[Cmd] = asyncio.Queue()

Recommendation: This is actually correct for asyncio.Queue. No change needed, but ensure type checking passes.


Low Priority / Nitpicks

9. Inconsistent Assertion Messages (packages/core_system/pytest_ten/pytest_ten/ten_test.py:606)

assert err is None, f"Test failed: {err.error_message()}"

Problem: This could raise AttributeError if err is not None but lacks error_message().

Recommendation: Use defensive access:

assert err is None, f"Test failed: {err.error_message() if err else 'unknown'}"

10. README Templates Not Updated (packages/core_system/pytest_ten/docs/README.*.md)

The README files contain placeholder text like "Default Extension Python Extension" and "xxx features".

Recommendation: Update with actual pytest_ten documentation before merging.


11. Missing uv.lock in .gitignore? (packages/core_system/pytest_ten/uv.lock)

A 147-line uv.lock file is committed, which is unusual for a library.

Recommendation: Verify if this should be in version control or .gitignore'd.


🎯 Best Practices & Quality

Good Practices Observed:

  • ✅ Proper use of async/await patterns
  • ✅ Context manager pattern with proper cleanup
  • ✅ Clear separation of concerns (Context, Runner, Decorator)
  • ✅ Comprehensive test coverage with multiple test files
  • ✅ Proper package metadata (LICENSE, manifest.json, pyproject.toml)
  • ✅ GN build system integration follows existing patterns
  • ✅ CI/CD updates are complete and consistent

Minor Suggestions:

  • Consider adding a pytest.ini or configuration example for test discovery
  • Add a migration guide from old-style ExtensionTester to new @ten_test decorator
  • Consider exposing timeout configuration as a class attribute on TenTestContext

🔒 Security

No security concerns identified. The package:

  • Doesn't handle sensitive data directly
  • Doesn't expose network services
  • Uses safe async patterns
  • Properly handles exceptions and errors

⚡ Performance

Performance considerations:

  • The 0.1s polling interval in expect_* methods is reasonable for testing
  • Queue operations are efficient
  • No apparent memory leaks (assuming frames are handled correctly)
  • Async patterns are properly used for non-blocking operations

Suggestion: For high-throughput tests, consider making the polling interval configurable.


🧪 Test Coverage

Excellent test coverage:

  • Basic message sending/receiving (test_recv_msg_during_starting.py)
  • Mock command responses (test_mock.py)
  • Callback timing (test_callback_after_stop_test.py)
  • Multiple test files demonstrate different usage patterns
  • Old-style tests kept for comparison

Minor gap: No explicit tests for collect_all_* or assert_no_* methods.


📋 Summary

Category Rating Notes
Code Quality ⭐⭐⭐⭐ (4/5) Well-structured, minor issues
Documentation ⭐⭐⭐⭐⭐ (5/5) Excellent docstrings and examples
Testing ⭐⭐⭐⭐ (4/5) Good coverage, could add more edge cases
Integration ⭐⭐⭐⭐⭐ (5/5) Complete CI/CD and build updates
Security ⭐⭐⭐⭐⭐ (5/5) No concerns

🚀 Recommendations for Merge

Before merging:

  1. Must fix: Replace asyncio.get_event_loop() with asyncio.get_running_loop()
  2. ⚠️ Should fix: Resolve put_back race condition or document limitation
  3. ⚠️ Should fix: Update README documentation files
  4. 💡 Consider: Clone audio/video frames or document why not needed

After these fixes, this PR is ready to merge. The new pytest_ten package is a valuable addition that will significantly improve the testing experience for TEN extensions.


Great work! This is a well-architected addition to the framework. 🎉

@claude
Copy link

claude bot commented Dec 8, 2025

Pull Request Review: pytest_ten System Package

Summary

This PR introduces a new Python system package pytest_ten that provides AAA-style testing utilities for TEN extensions. The implementation is well-structured and adds significant value to the testing infrastructure.

Critical Issues Found

1. Missing clone() for AudioFrame/VideoFrame (ten_test.py:502, 509)

Unlike on_cmd and on_data which clone the messages, audio/video frames are not cloned before being put in the queue. This could lead to data corruption if the frame is modified elsewhere.

2. README Documentation is Placeholder

The docs/README.en-US.md contains only placeholder text copied from a template and should be updated with actual pytest_ten documentation.

Detailed Review

See full review in next comment...

@claude
Copy link

claude bot commented Dec 8, 2025

Strengths

Code Quality

  • Clean Architecture: The separation between TenTestContext (user-facing API) and _TenTestRunner (internal implementation) is well-designed
  • Comprehensive API: The context provides a rich set of methods for both sending and asserting on messages
  • Good Documentation: Extensive docstrings with clear examples in the decorator
  • Proper Error Handling: Exception handling in _run_test() properly converts Python exceptions to TenError types

Design Patterns

  • Decorator Pattern: The ten_test decorator elegantly wraps test functions and handles pytest fixture integration
  • Queue-based Collection: Using asyncio queues to collect messages is a solid async pattern
  • Mock Support: Built-in support for mocking commands, streaming commands, and data responses is excellent

Integration

  • CI/CD Integration: Properly wired into GitHub workflows for publishing
  • Build System Integration: Correctly integrated into BUILD.gn with proper dependencies
  • Version Management: Included in version bump scripts

@claude
Copy link

claude bot commented Dec 8, 2025

All Issues Found

Critical Issues

1. Missing clone() for AudioFrame/VideoFrame

  • Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:502, 509
  • Issue: Audio/video frames are not cloned before being put in the queue
  • Impact: Data corruption if frames are modified elsewhere, potential resource leaks
  • Fix: Add .clone() call like the on_cmd and on_data handlers do

Medium Priority

2. Type Annotation Inconsistency

  • Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:408
  • Issue: test_func should be Callable[[TenTestContext], Awaitable[Any]] not Callable[[TenTestContext], Any]

3. README is Placeholder

  • Location: packages/core_system/pytest_ten/docs/README.en-US.md
  • Issue: Contains only template text, needs actual pytest_ten documentation

Low Priority

4. Busy-Wait Pattern

  • Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:86, 136
  • Issue: Uses 0.1s timeouts with 0.05s sleeps when putting back messages
  • Impact: CPU usage, could be optimized

5. Missing Type Hints

  • Location: core/src/ten_runtime/binding/python/interface/ten_runtime/async_test.py
  • Issue: Refactored callback functions lack type annotations

@claude
Copy link

claude bot commented Dec 8, 2025

Test Coverage Analysis

Good

  • Example tests demonstrate basic functionality (test_mock.py)
  • Integration with existing test infrastructure
  • Mock support is well-tested
  • Shows both old and new testing approaches side-by-side

Missing

  • No unit tests for the pytest_ten package itself
  • No tests for error conditions (timeouts, assertion failures)
  • No tests for put_back functionality
  • No tests for collect_all_* methods
  • No tests for assert_no_* methods

Recommendation: Add a test suite under packages/core_system/pytest_ten/tests/ to test the testing framework itself.

Security & Performance

Security

  • No security concerns identified
  • No sensitive data handling
  • Proper error handling prevents information leakage

Performance

Positive:

  • Efficient queue-based message collection
  • Proper use of asyncio primitives
  • Minimal overhead for test harness

Concerns:

  • Busy-wait pattern in expect_* methods could be optimized
  • Queue size of 10 for streaming commands may be insufficient for high-throughput tests (consider making configurable)

@claude
Copy link

claude bot commented Dec 8, 2025

Recommendations

Must Fix Before Merge

  1. Clone audio/video frames in on_audio_frame and on_video_frame (ten_test.py:502, 509)
    • Add: await self._ctx.received_audio.put(audio_frame.clone())
    • Add: await self._ctx.received_video.put(video_frame.clone())
  2. Update README with actual documentation explaining what pytest_ten is, how to use it, and migration guide

Should Fix Soon After Merge

  1. Add comprehensive tests for pytest_ten itself
  2. Fix type annotation inconsistencies
  3. Optimize the busy-wait pattern in expect methods
  4. Add more example tests demonstrating advanced features (streaming, put_back, assert_no_*)

Nice to Have

  1. Make queue sizes configurable
  2. Add helper methods for common patterns like send_and_expect_data
  3. Add performance benchmarks
  4. Consider adding support for property validation helpers

Overall Assessment

This is a well-designed and valuable addition to the TEN Framework testing infrastructure. The AAA-style approach with TenTestContext is much more intuitive than the callback-based approach. The implementation is solid with only a few critical issues that need addressing.

Recommendation: Approve with minor changes required (fix the cloning issue and documentation).

Score: 8.5/10

The main detractors are:

  • Missing clone() for audio/video frames (critical bug)
  • Placeholder documentation
  • Lack of tests for the testing framework itself

Once these issues are addressed, this will be an excellent addition to the codebase. The refactoring to named callbacks in async_test.py is a nice improvement for debuggability.


async def send_video_frame(self, frame: VideoFrame) -> None:
"""Send video frame to extension"""
await self.ten_env.send_video_frame(frame)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Send methods silently discard error return values

The send_data, send_audio_frame, and send_video_frame methods in TenTestContext declare a return type of None but the underlying AsyncTenEnvTester methods return TenError | None. The returned error values are being silently discarded, which could cause tests to appear to pass when send operations actually fail. These methods need to propagate the error return value to callers.

Fix in Cursor Fix in Web

@claude
Copy link

claude bot commented Dec 8, 2025

Pull Request Review: pytest_ten System Package

I've completed a comprehensive review of this PR. Overall, this is a well-structured addition that introduces AAA-style testing capabilities to the TEN Framework. Here's my detailed feedback:


✅ Strengths

1. Excellent Code Quality

  • Clean, well-documented code with comprehensive docstrings
  • Proper type hints throughout (Python 3.10+ syntax)
  • Good separation of concerns between TenTestContext and _TenTestRunner
  • Follows AAA (Arrange-Act-Assert) testing pattern effectively

2. Comprehensive API Design

  • expect_* methods for receiving messages with timeout handling
  • collect_all_* methods for gathering multiple messages
  • assert_no_* methods for negative assertions
  • Support for pytest fixtures integration
  • Mock support for commands and data responses

3. Robust Error Handling

  • Proper exception handling with custom error codes (AssertionError → ErrorCodeGeneric, TimeoutError → ErrorCodeTimeout)
  • Good traceback preservation for debugging
  • Task cancellation handled correctly in on_deinit

4. Good Refactoring in async_test.py

  • Named callback functions instead of lambdas (lines 70-112 in async_test.py)
  • Improves code readability and debuggability

⚠️ Issues & Concerns

1. Critical: Race Condition in Message Filtering

Location: ten_test.py:91-96, 141-146

if name and data.get_name() != name:
    if put_back:
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # ⚠️ Arbitrary delay
    continue

Issues:

  • The put_back mechanism with asyncio.sleep(0.05) is a race condition waiting to happen
  • Multiple concurrent expect_* calls could cause message reordering
  • The 0.05s delay is arbitrary and not based on any synchronization guarantee
  • Could lead to messages being lost or delivered out of order

Recommendation:

  • Consider using separate queues per message name/type
  • Or implement a proper message routing system with locks
  • Document that put_back=True should not be used with concurrent expect_* calls

2. Bug: Missing put_back Support for Audio/Video Frames

Location: ten_test.py:153-197

The expect_audio_frame and expect_video_frame methods don't support put_back parameter, but they also can't filter by name/type. This inconsistency could be confusing.

Recommendation: Either add filtering support or document why it's not needed.

3. Performance: Clone Every Message

Location: ten_test.py:467, 489

await self._ctx.received_cmds.put(cmd.clone())
await self._ctx.received_data.put(data.clone())

Issue: Cloning every message might impact performance for high-throughput scenarios (especially video frames).

Recommendation:

  • Document the cloning behavior
  • Consider making cloning optional if messages are read-only in tests

4. Security: Unbounded Queue Growth

Location: ten_test.py:31-34

self.received_cmds: asyncio.Queue[Cmd] = asyncio.Queue()
self.received_data: asyncio.Queue[Data] = asyncio.Queue()

Issue: Unbounded queues could lead to memory exhaustion if messages arrive faster than they're consumed.

Recommendation: Add maxsize limit or document that tests should always consume messages.

5. Documentation: README is Placeholder

Location: packages/core_system/pytest_ten/docs/README.en-US.md

The README files are still using placeholder text ("Default Extension Python Extension", "xxx features").

Recommendation: Update with actual documentation about:

  • What pytest_ten does
  • Installation/setup instructions
  • Usage examples
  • API reference

6. Minor: VSCode Settings Removed AI Agents Paths

Location: .vscode/settings.json:36-37

-    "./ai_agents/agents/ten_packages/system/ten_ai_base/interface",
-    "./ai_agents/agents/ten_packages/system/ten_ai_base/lib"

Question: Was this intentional? Are these paths no longer needed, or should they be kept alongside the new pytest_ten path?


🔍 Design Questions

1. Test Function Signature Magic

The decorator extracts pytest fixtures from the function signature (lines 608-610). This is clever but:

  • May be confusing for developers unfamiliar with the pattern
  • First parameter ctx must always be TenTestContext (enforced by convention, not type system)

Suggestion: Add runtime validation that first parameter is named 'ctx' with helpful error message.

2. Mock Response Limitations

Mock responses are one-shot - they return a single result. For streaming scenarios, mock_cmd_stream_response helps, but there's no mock for audio/video frames.

Question: Is mocking audio/video frames out of scope, or should it be added?


🧪 Test Coverage

The updated tests demonstrate good usage:

  • test_mock.py: Shows mock command responses ✅
  • test_recv_msg_during_starting.py: Shows basic expect pattern ✅
  • test_callback_after_stop_test.py: Shows complex lifecycle handling ✅

Suggestion: Add tests for:

  • Multiple concurrent expect_* calls
  • put_back=True behavior
  • Timeout edge cases
  • collect_all_* methods

🏗️ Build & CI

Positive:

  • Proper GN build integration ✅
  • CI pipeline updated to publish pytest_ten ✅
  • Version tooling updated to include pytest_ten ✅
  • dev_dependencies support added ✅

Note:

  • The uv.lock file was added but uv.lock files can be large (147 additions). Consider if this should be in the repo or generated.

🔒 Security Assessment

No critical security issues found.

Minor considerations:

  • Unbounded queue memory usage (mentioned above)
  • No input validation on timeout values (negative/inf values accepted)

📊 Performance Considerations

  1. Message cloning overhead - acceptable for testing but document it
  2. 0.1s polling intervals in expect methods - reasonable for tests
  3. Queue operations - efficient enough for testing workloads

✅ Best Practices Compliance

  • ✅ Apache 2.0 license headers present
  • ✅ Type hints used consistently
  • ✅ Follows async/await patterns correctly
  • ✅ Good error messages
  • ✅ Proper task cleanup in on_deinit

📝 Recommendations Summary

Must Fix:

  1. Document the put_back race condition and when it's safe to use
  2. Update README files with actual documentation
  3. Add maxsize to queues or document unbounded growth risk

Should Fix:

  1. Validate first parameter name in decorator
  2. Consider separate queues per message name to avoid race conditions
  3. Add more comprehensive test coverage

Nice to Have:

  1. Audio/video frame mocking support
  2. Timeout value validation
  3. Make message cloning optional for performance

🎯 Overall Assessment

This is a high-quality addition to the TEN Framework that will significantly improve the testing experience for Python extensions. The code is well-written, properly integrated, and follows good practices.

Recommendation: ✅ Approve with minor revisions

The identified issues are mostly documentation gaps and edge case handling that can be addressed in follow-up PRs if needed. The core functionality is solid and ready for use.

Great work! 🚀

@claude
Copy link

claude bot commented Dec 9, 2025

Pull Request Review: pytest_ten System Package

Overview

This PR introduces a new Python system package pytest_ten that provides AAA-style (Arrange-Act-Assert) pytest integration for TEN Framework testing. The implementation is well-structured and follows good software engineering practices.


Strengths

1. Architecture & Design

  • Clean Separation of Concerns: TenTestContext clearly separates test orchestration from the internal _TenTestRunner implementation
  • Excellent API Design: The AAA-style testing API is intuitive and developer-friendly
  • Comprehensive Mock Support: Supports mocking commands, streaming commands, and data responses
  • Async-First Design: Properly leverages Python's asyncio for non-blocking test execution

2. Code Quality

  • Excellent Documentation: Comprehensive docstrings with clear examples throughout
  • Type Hints: Proper use of type annotations (Python 3.10+ union syntax)
  • Error Handling: Well-structured exception handling with proper TenError reporting
  • Consistent Naming: Clear and consistent naming conventions

3. Testing Infrastructure

  • Good Integration: Updates CI/CD workflows to include the new package
  • Example Tests: Provides clear examples of how to use the new testing framework
  • Backward Compatibility: Maintains existing test infrastructure while adding new capabilities

⚠️ Issues & Concerns

1. Critical: Race Condition in Message Filtering (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:54-102)

The expect_data() and expect_cmd() methods have potential race conditions when using put_back=True:

if name and data.get_name() != name:
    if put_back:
        # Put back and wait before retrying
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # Small delay to avoid busy loop

Issue: The 50ms sleep is arbitrary and may not prevent race conditions in concurrent tests. Messages could be reordered or lost.

Recommendation: Consider using a priority queue or separate queues per message name, or document that put_back=True should not be used in concurrent scenarios.


2. Medium: Unused Import (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:3)

import inspect
import traceback

The inspect module is imported but only used once. While not harmful, it's good to keep imports minimal.


3. Medium: Inconsistent Timeout Handling (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:153-197)

Audio/video frame expectation uses direct asyncio.wait_for(), while cmd/data use custom polling loops:

# Audio/Video approach
return await asyncio.wait_for(self.received_audio.get(), timeout=timeout)

# Cmd/Data approach  
while True:
    remaining = end_time - asyncio.get_event_loop().time()
    attempt_timeout = min(remaining, 0.1)
    data = await asyncio.wait_for(...)

Recommendation: Use consistent approach across all message types. The polling approach with 0.1s chunks seems unnecessary for audio/video frames.


4. Medium: Queue Size Limitations (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:31-34)

Unbounded queues could lead to memory issues in long-running or high-throughput tests:

self.received_cmds: asyncio.Queue[Cmd] = asyncio.Queue()
self.received_data: asyncio.Queue[Data] = asyncio.Queue()

Recommendation: Consider adding configurable max queue sizes with proper backpressure handling.


5. Low: Missing Clone for Audio/Video (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:497-509)

Commands and data are cloned before queuing, but audio/video frames are not:

async def on_data(self, ten_env: AsyncTenEnvTester, data: Data) -> None:
    await self._ctx.received_data.put(data.clone())  # ✅ Cloned

async def on_audio_frame(self, ten_env: AsyncTenEnvTester, audio_frame: AudioFrame) -> None:
    await self._ctx.received_audio.put(audio_frame)  # ❌ Not cloned

Recommendation: Verify whether AudioFrame/VideoFrame need cloning for thread-safety. If they do, add .clone() calls.


6. Low: Task Cleanup in on_deinit (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:432-438)

Task cancellation silently swallows CancelledError:

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass

Recommendation: Consider logging when task is cancelled, especially if it's cancelled prematurely (before test completion).


7. Documentation: README Files Need Content (packages/core_systems/pytest_ten/docs/)

The README files contain template placeholders:

# Default Extension Python Extension
<!-- Brief introduction for the extension -->
- xxx features

Recommendation: Replace with actual documentation explaining:

  • What pytest_ten is and why to use it
  • Installation instructions
  • API reference
  • Migration guide from old testing approach

🔒 Security Considerations

No Security Issues Found

  • No unsafe deserialization
  • No command injection vulnerabilities
  • No path traversal issues
  • Proper error message handling (doesn't leak sensitive info)

Performance Considerations

1. Efficient Message Handling

The queue-based approach is performant for typical test scenarios.

2. Potential Concern: Polling Loop Overhead

The 0.1s polling chunks in expect_cmd/expect_data add unnecessary complexity. For most cases, direct queue waiting would be more efficient.

3. Memory Management

Unbounded queues could grow large in stress tests. Consider:

  • Adding queue size limits
  • Implementing automatic cleanup for old messages
  • Adding metrics/warnings when queues grow large

🧪 Test Coverage

Good Coverage

  • Example tests demonstrate core functionality
  • Tests cover both old and new testing approaches
  • Mock functionality is tested

📝 Suggestions

  1. Add tests for edge cases:

    • Timeout behavior when no messages arrive
    • Behavior when queues are full (if size limits added)
    • Concurrent message handling
    • Multiple expect calls with filtering
  2. Add tests for error conditions:

    • Test failures with proper error messages
    • Exception handling in test functions
    • Cleanup on test failure
  3. Consider adding integration tests that:

    • Test with real extensions (not just mocks)
    • Test performance under load
    • Verify memory cleanup

📋 Other Observations

1. Version Synchronization

Good: Version tooling properly updated to include pytest_ten

2. CI Integration

Properly integrated into packaging and publishing workflows

3. Python Version

Requires Python 3.10+, which is reasonable for modern type hints

4. Minor: uv.lock File

The PR includes uv.lock with 147 additions. Consider whether lock files should be committed or gitignored.


🎯 Recommendations Summary

Must Fix:

  1. Address the race condition in put_back logic or document its limitations
  2. Complete the README documentation

Should Fix:

  1. Make timeout handling consistent across all message types
  2. Verify and fix audio/video frame cloning if needed
  3. Add queue size limits or document unbounded queue behavior

Nice to Have:

  1. Remove unused imports
  2. Add logging for task cancellation
  3. Expand test coverage for edge cases

Conclusion

This is a well-designed and valuable addition to the TEN Framework. The pytest_ten package provides a much cleaner and more intuitive testing API compared to the callback-based approach. The code quality is high, with good documentation and proper async handling.

The issues identified are relatively minor and mostly involve edge cases and documentation. The critical race condition should be addressed, but the core design is solid.

Overall Assessment: Approve with minor changes requested

Great work on this feature! The AAA-style testing will significantly improve the developer experience.

@halajohn halajohn merged commit 29c4d0c into main Dec 9, 2025
10 of 16 checks passed
@halajohn halajohn deleted the feat/aaa_test branch December 9, 2025 03:01
@claude
Copy link

claude bot commented Dec 9, 2025

Pull Request Review: pytest_ten System Package

This PR introduces a well-designed pytest integration system for TEN Framework that enables AAA (Arrange-Act-Assert) style testing. The implementation is solid overall with comprehensive features and good integration with the existing codebase.

Strengths

Architecture and Design

  • Clean separation: Moving pytest_ten to a separate system package is the right architectural choice
  • AAA pattern: The TenTestContext API provides an intuitive testing interface
  • Comprehensive API: Well-designed methods including expect, collect_all, and assert_no for various testing scenarios
  • Flexible mocking: Support for mocking commands (single + streaming) and data responses

Code Quality

  • Good documentation: Extensive docstrings and inline comments throughout
  • Error handling: Proper exception handling with specific error types
  • Type hints: Consistent use of type annotations (Python 3.10+ union syntax)
  • Async safety: Named callback functions in async_test.py prevent lambda closure issues

Integration

  • CI/CD integration: Properly wired into Ubuntu workflow for packaging/publishing
  • Build system: Correct GN build configuration with dependencies
  • Versioning: Integrated into version update scripts with dev_dependencies support
  • Testing: Real-world test examples demonstrate the framework works

Issues and Suggestions

1. Potential Race Condition in put_back Logic

Location: packages/core_systems/pytest_ten/pytest_ten/ten_test.py:91-96

The put_back feature has a subtle race condition. If multiple messages arrive rapidly, the queue ordering may be disrupted when putting messages back.

Recommendation: Add a warning in documentation about put_back=True limitations with high-frequency message streams.

2. Missing Return Type Handling

Location: packages/core_systems/pytest_ten/pytest_ten/ten_test.py:42-43

The send_data and similar methods return None but underlying ten_env.send_data() might return TenError. Consider checking return value and raising exception on error, or changing return type to TenError | None for consistency with send_cmd().

3. Inconsistent Timeout Behavior

The expect methods use polling with 0.1s attempts, while expect_audio_frame/video_frame use direct waits. Recommend standardizing timeout behavior across all expect methods for consistency.

4. Event Loop Deprecation

Multiple places use asyncio.get_event_loop() which is soft-deprecated in Python 3.10+. Recommend using asyncio.get_running_loop() instead.

5. Pytest Fixture Support - Incomplete Implementation

Location: packages/core_systems/pytest_ten/pytest_ten/ten_test.py:418-421

The fixture storage (self._pytest_fixtures) exists but is never populated! The decorator does not extract pytest fixtures. Either implement fixture extraction or remove the incomplete code to avoid confusion.

6. Documentation: Missing Dropped Messages Warning

The put_back=False default silently drops unmatched messages, which could lead to confusing test failures. Add prominent warning in docstrings about this behavior.

7. Directory Name Inconsistency

The package is in packages/core_systems/ but BUILD.gn references core_system (singular). Check if this is intentional.

Security Considerations

No security issues identified. Test-only code with appropriate isolation.

Test Coverage

Good: New test file test_mock.py demonstrates both old and new testing styles.

Suggestions: Add tests for edge cases (timeout scenarios, put_back=True with multiple messages, collect_all with empty queues, assert_no methods, streaming mock responses).

Performance Considerations

  • Async/await used correctly throughout
  • 0.1s polling interval in expect methods adds latency - consider making configurable
  • Queue cloning on every message - ensure these are lightweight operations

Summary

Overall Assessment: Approve with minor fixes recommended

This is a well-executed feature that significantly improves testing ergonomics for TEN Framework. The core implementation is solid with good design patterns.

Priority Fixes:

  1. Fix or remove incomplete pytest fixture support (Issue 5)
  2. Document put_back race condition limitations (Issue 1)
  3. Replace deprecated get_event_loop() calls (Issue 4)
  4. Add edge case test coverage

Nice-to-Have:

  • Standardize timeout behavior (Issue 3)
  • Add dropped messages warning to docs (Issue 6)

The PR is ready to merge after addressing the pytest fixture issue - it is currently non-functional code that could confuse users.

Great work on this feature!

cyfyifanchen added a commit that referenced this pull request Dec 10, 2025
* fix: remove the version lock on ten_ai_base (#1818)

* fix: remove the version lock on ten_ai_base

* fix: revert

---------

Co-authored-by: Ethan Zhang <[email protected]>

* feat: add a python system package `pytest_ten` supporting AAA-style tests (#1838)

* feat: add a python test wrapper supporting AAA-style tests

* fix: use relative import

* feat: move test_wrapper from ten_runtime into core_system package

* test: add example tests using pytest_ten

* chore: refine code

* feat: update workflow to upload pytest_ten system package

* fix: refine code

* fix: test case on windows

---------

Co-authored-by: xxxxl_sun <[email protected]>
Co-authored-by: sunxilin <[email protected]>

* fix: elevenlabs asr language (#1842)

Co-authored-by: liaochenliang <[email protected]>

* feat: support tman install app (#1845)

Co-authored-by: Hu Yueh-Wei <[email protected]>

* refactor(property): rename max_tokens to max_completion_tokens for clarity (#1848)

Update property.json files to use more descriptive parameter name max_completion_tokens instead of max_tokens. Also includes minor text and voice setting adjustments.

* feat: add gradium tts (#1843)

* feat: add gradium tts

* style(gradium_tts_python): improve code formatting and line wrapping

* refactor: simplify error logging and config handling

* fix: update licence when uploading to ppa (#1851)

* feat: winget (#1844)

* feat: winget (dry run when push)

* fix: remove potential code injection

* fix: variable format

* fix: build command args

* chore: test production run, trigger PR to  winget-pkgs

* fix: higher rate limit

* Revert "chore: test production run, trigger PR to  winget-pkgs"

This reverts commit 4d5d6e0.

* chore: modify PackageIdentifier

* chore: refine codes

* fix: trigger

* chore: refine codes

* chore: refine codes

* chore: refine template

* fix: license type

* fix: refine codes

---------

Co-authored-by: Nie Zhihe <[email protected]>
Co-authored-by: Hu Yueh-Wei <[email protected]>

* fix: update minimax api endpoint from .com to .io (#1852)

* refactor: simplify build command by removing redundant clean task (#1849)

Co-authored-by: Ethan Zhang <[email protected]>

* fix(voice-assistant-live2d): update voice settings and api endpoint

* fix: update minimax websocket url from minimaxi.io to minimax.io (#1854)

---------

Co-authored-by: xxxxl_sun <[email protected]>
Co-authored-by: Ethan Zhang <[email protected]>
Co-authored-by: Jun <[email protected]>
Co-authored-by: sunxilin <[email protected]>
Co-authored-by: liaochenliang <[email protected]>
Co-authored-by: liaochenliang <[email protected]>
Co-authored-by: Hu Yueh-Wei <[email protected]>
Co-authored-by: Nie Zhihe <[email protected]>
Co-authored-by: Nie Zhihe <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants