feat: add a python system package `pytest_ten` supporting AAA-style tests #1838

anexpn · 2025-12-07T16:39:28Z

Note

Introduces the pytest_ten system package for pytest-based AAA-style testing and wires it into build, packaging/publishing, CI, versioning scripts, and Python integration tests.

New system package: packages/core_systems/pytest_ten
- Provides AAA-style testing utilities: TenTestContext, ten_test decorator.
- Adds build rules (BUILD.gn), manifest, docs, pyproject.toml, uv.lock, and publish target.
Build/Repo config
- Include //packages/core_systems in ten_framework_all.
- Add pytest_ten to editor/type-checker paths (.vscode/settings.json, pyrightconfig.json).
CI/CD (Ubuntu 22.04 workflow)
- Package/publish ten_packages/system/pytest_ten alongside other system packages.
- Update test artifacts and PYTHONPATH to load pytest_ten.
Versioning tools
- Support updating dev_dependencies in manifests.
- Add pytest_ten to version update pipeline.
Python binding
- Refactor async callbacks in AsyncTenEnvTester to named functions for queue handling.
Tests
- Add/convert Python integration tests to use pytest_ten; add dev_dependencies on pytest_ten and adjust start scripts to include it.

^{Written by Cursor Bugbot for commit 9884229. This will update automatically on new commits. Configure here.}

claude · 2025-12-07T16:44:28Z

Pull Request Review

This PR adds a new Python system package pytest_ten that provides AAA-style (Arrange-Act-Assert) testing support for TEN extensions. Overall, this is a well-crafted addition with clean design and good test coverage. Here's my detailed feedback:

✅ Strengths

Code Quality

Excellent API design: The TenTestContext class provides a clean, intuitive interface for testing
Comprehensive functionality: Supports all message types (Cmd, Data, AudioFrame, VideoFrame)
Good documentation: Extensive docstrings and examples in the ten_test decorator
Clean separation: _TenTestRunner is properly marked as internal with leading underscore
Type hints: Good use of type annotations throughout

Testing

Good test coverage: Three example tests demonstrate different use cases (basic, mocking, fixtures)
Practical examples: The test files show real-world usage patterns

⚠️ Issues & Recommendations

🔴 High Priority

1. Race Condition in Message Filtering (packages/core_system/pytest_ten/pytest_ten/ten_test.py:74-98)

The expect_data and expect_cmd methods have a potential race condition when using put_back=True:

if name and data.get_name() \!= name:
    if put_back:
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # Race: another expect_ may grab it
    continue

Problem: Between put and the next get, another concurrent expect_* call could consume the message.

Recommendation: Consider using a deque or implementing a proper message queue with filtering support instead of the put-back pattern.

2. Missing Resource Cleanup (packages/core_system/pytest_ten/pytest_ten/ten_test.py:380-491)

The _TenTestRunner class stores queues and tasks but doesn't provide cleanup for unconsumed messages.

Recommendation: Add cleanup in on_deinit or after test completion:

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass
    
    # Clear queues
    if self._ctx:
        for queue in [self._ctx.received_cmds, self._ctx.received_data, 
                      self._ctx.received_audio, self._ctx.received_video]:
            while not queue.empty():
                try:
                    queue.get_nowait()
                except asyncio.QueueEmpty:
                    break

3. Incomplete README Documentation (packages/core_system/pytest_ten/docs/README.en-US.md)

The README is a template placeholder with no actual content about pytest_ten.

Recommendation: Replace with actual documentation covering:

What pytest_ten is and why to use it
Installation instructions
Quick start example
API reference link
Migration guide from old test style

🟡 Medium Priority

4. Hardcoded Timeout Values (packages/core_system/pytest_ten/pytest_ten/ten_test.py:84, 132)

Using min(remaining, 0.1) hardcodes the polling interval.

Recommendation: Make this configurable or document why 0.1s was chosen.

5. Direct Access to Private Member (packages/core_system/pytest_ten/pytest_ten/ten_test.py:604)

tester._pytest_fixtures = kwargs

Recommendation: While this works, consider making it a public method:

def set_pytest_fixtures(self, fixtures: dict[str, Any]) -> None:
    self._pytest_fixtures = fixtures

6. Potential Memory Issue with Clone (packages/core_system/pytest_ten/pytest_ten/ten_test.py:449, 471)

Every message is cloned before queuing, which could be expensive for large frames.

Recommendation: Document memory implications or consider shallow copies where appropriate.

7. Mock Response Priority Not Documented (packages/core_system/pytest_ten/pytest_ten/ten_test.py:452-466)

Stream mocks take precedence over regular mocks, but this isn't documented.

Recommendation: Add docstring clarification:

# Note: mock_cmd_stream_response takes precedence over mock_cmd_response
# if both are provided for the same command name

🟢 Low Priority

8. Inconsistent Error Messages (packages/core_system/pytest_ten/pytest_ten/ten_test.py)

Some methods use f-strings, others use format strings for consistency.

Recommendation: Standardize on f-strings throughout.

9. Missing Type Hints for sync wrapper (packages/core_system/pytest_ten/pytest_ten/ten_test.py:602)

The wrapper function lacks return type annotation.

Recommendation: Add -> None annotation.

10. Deprecated API Usage (packages/core_system/pytest_ten/pytest_ten/ten_test.py:74, 122, 200, etc.)

Using asyncio.get_event_loop().time() - should use asyncio.get_running_loop().time() in async context.

Recommendation: Update to modern asyncio API:

end_time = asyncio.get_running_loop().time() + timeout

🔒 Security

No security concerns identified. The code properly handles:

Error messages with traceback (good for debugging)
No unsafe eval/exec usage
Proper async exception handling

⚡ Performance

Considerations:

Message cloning overhead: Every message is cloned, which may impact high-throughput tests
Polling pattern: The 0.1s polling interval in expect_* methods could add latency
Queue growth: No bounds on queue sizes could lead to memory issues in long-running tests

Recommendation: Consider adding:

Optional message cloning (clone_messages=True parameter)
Bounded queue sizes with overflow handling
Performance notes in documentation

📝 Documentation & Conventions

Missing Documentation:

✅ Good: Extensive docstrings in code
❌ Missing: Actual README content (currently placeholder)
❌ Missing: Migration guide from old test style
⚠️ Incomplete: Chinese README also needs content

Code Conventions:

✅ Follows PEP 8
✅ Good naming conventions
✅ Proper use of private/public members
⚠️ Minor: Some inconsistent string formatting

🧪 Test Coverage

Current Coverage:

✅ Basic test flow (test_recv_msg_during_starting.py)
✅ Mock responses (test_mock.py)
✅ Callback handling (test_callback_after_stop_test.py)

Missing Coverage:

❌ put_back=True parameter usage
❌ Timeout edge cases
❌ collect_all_* methods
❌ assert_no_* methods
❌ Audio/Video frame testing
❌ Mock data response functionality
❌ Stream mock responses
❌ Error conditions (malformed messages, etc.)

Recommendation: Add tests for the untested code paths.

🏗️ Build System

BUILD.gn:

✅ Proper package configuration
✅ Correct dependency specification
✅ Docs file globbing looks correct
⚠️ Consider adding a comment explaining the glob script usage

pyproject.toml:

✅ Proper setuptools configuration
✅ Correct dependency specification (pytest>=8.2,<10)
✅ Python version requirement (>=3.10)

uv.lock:

⚠️ Lock file included but truncated in diff - verify it's complete
✅ Standard dependencies visible (colorama, pytest)

📊 Summary

Category	Rating	Notes
Code Quality	⭐⭐⭐⭐	Well-structured, clean design
Documentation	⭐⭐	Good inline docs, missing README
Testing	⭐⭐⭐	Good examples, missing edge cases
Security	⭐⭐⭐⭐⭐	No concerns
Performance	⭐⭐⭐	Adequate, some optimization opportunities
Overall	⭐⭐⭐⭐	Recommended with minor revisions

🎯 Action Items

Before Merge:

❗ Fix the README files with actual content (high priority)
❗ Document the race condition in put_back parameter or remove it
❗ Add resource cleanup in on_deinit

Future Enhancements:

Add comprehensive test coverage for all code paths
Consider performance optimizations for high-throughput scenarios
Add migration guide from old test patterns
Update to modern asyncio APIs

Great work on this feature! The core implementation is solid and provides valuable testing capabilities. The main concerns are around documentation completeness and a few edge cases in the filtering logic. Once the README is updated and cleanup is added, this will be ready to merge.

claude · 2025-12-08T02:25:15Z

Code Review: pytest_ten System Package

Thank you for this comprehensive addition of pytest testing infrastructure! This is a valuable enhancement that provides AAA-style testing for TEN extensions. Here's my detailed review:

✅ Strengths

Excellent API Design: The TenTestContext API provides a clean, intuitive interface for extension testing with proper async/await patterns
Comprehensive Testing Helpers: Good coverage of assertion methods (assert_no_data, assert_no_cmd, etc.) and collection methods
Strong Documentation: Extensive docstrings and examples in the decorator show excellent attention to developer experience
Good Integration: The pytest fixture integration is well-designed, allowing seamless use of pytest fixtures alongside TEN test context
Proper Error Handling: Error categorization (AssertionError, TimeoutError, generic errors) maps well to TenErrorCode

🐛 Potential Issues

1. asyncio.get_event_loop() Deprecation (Medium Priority)

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:74, 122, 200, etc.

# Current code (deprecated in Python 3.10+)
end_time = asyncio.get_event_loop().time() + timeout

Issue: asyncio.get_event_loop() is deprecated when not running in an event loop. In Python 3.12+, this will raise warnings or errors.

Recommendation: Use asyncio.get_running_loop() instead:

end_time = asyncio.get_running_loop().time() + timeout

This pattern appears in:

expect_data(): lines 74, 77, 84
expect_cmd(): lines 122, 125, 132
collect_all_data(): lines 200, 203
collect_all_cmds(): lines 229, 232
collect_all_audio_frames(): lines 254, 257
collect_all_video_frames(): lines 279, 282

2. Resource Cleanup in _TenTestRunner.on_deinit() (Low Priority)

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:414-420

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass

Issue: Queues in TenTestContext are not explicitly cleaned up, which could lead to memory retention if tests fail.

Recommendation: Consider adding cleanup:

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass
    
    # Clear queues to prevent memory leaks
    if self._ctx:
        while not self._ctx.received_cmds.empty():
            self._ctx.received_cmds.get_nowait()
        while not self._ctx.received_data.empty():
            self._ctx.received_data.get_nowait()
        # ... clear other queues

3. Potential Race Condition in put_back Logic (Low Priority)

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:89-92, 137-140

if put_back:
    await self.received_data.put(data)
    await asyncio.sleep(0.05)  # Small delay to avoid busy loop

Issue: The put_back logic could theoretically starve if multiple consumers are racing. The 0.05s sleep is a heuristic that might not work in all scenarios.

Note: This is likely acceptable for testing scenarios, but worth documenting the limitation.

🔒 Security Considerations

✅ No security issues identified. The code operates in a test context and doesn't handle user input or network data directly.

⚡ Performance Considerations

1. Timeout Polling Overhead

The expect_* methods use 0.1s polling intervals:

attempt_timeout = min(remaining, 0.1)

Assessment: This is reasonable for tests, but adds 100ms latency in the worst case. For high-frequency testing, this could add up.

Recommendation: Consider making the poll interval configurable, or use an event-based approach with asyncio.Event for zero-latency wakeups.

2. Message Cloning

await self._ctx.received_cmds.put(cmd.clone())
await self._ctx.received_data.put(data.clone())

Assessment: Good defensive programming to prevent mutation issues. The performance cost is acceptable for testing.

📝 Code Quality

1. Inconsistent README Placeholders

Location: packages/core_system/pytest_ten/docs/README.en-US.md

The README files contain placeholder content:

# Default Extension Python Extension

## Features
- xxx features

Recommendation: Replace with actual pytest_ten documentation:

# pytest_ten - TEN Framework Pytest Integration

`pytest_ten` provides AAA (Arrange-Act-Assert) style testing helpers for TEN Framework extensions.

## Features
- Clean `@ten_test` decorator for pytest integration
- Context-based API for sending/receiving messages
- Built-in timeout and assertion helpers
- Support for pytest fixtures
- Mock command and data responses

2. Type Hints Consistency

The code uses modern Python 3.10+ type hints (str | None, dict[str, Any]) which is great! Ensure requires-python = ">=3.10" is enforced (already done in pyproject.toml ✅).

🧪 Test Coverage

Example Tests Look Good

The integration tests demonstrate proper usage:

✅ test_mock_cmd_result_with_pytest_ten: Shows mock command responses
✅ test_recv_msg_during_starting: Shows basic message flow
✅ test_basic_echo: Shows async command handling

Suggestion: Consider adding a test for the put_back parameter to validate the message filtering behavior.

📦 Build Configuration

BUILD.gn Issues

✅ Properly configured - The package correctly includes all necessary files and sets up dependencies.

Minor Note: The uv.lock file is checked in (147 lines). Consider adding a comment in the BUILD.gn or docs about when/how to update it.

🎯 Recommendations Summary

High Priority

✅ Fix asyncio.get_event_loop() deprecation - Replace with asyncio.get_running_loop()

Medium Priority

📝 Update README documentation - Replace placeholder content with actual pytest_ten docs
🧹 Add resource cleanup in on_deinit()

Low Priority

📖 Document put_back limitations - Add note about race conditions in docstrings
🧪 Add test for put_back - Ensure the message filtering works as expected
🔧 Make poll interval configurable - Allow tuning for different test scenarios

✨ Overall Assessment

This is a well-designed and valuable addition to the TEN Framework! The API is intuitive, the documentation is thorough, and the integration with pytest is seamless. The main issue is the asyncio.get_event_loop() deprecation, which should be addressed before merge.

Recommendation: Approve with minor changes (fix the event loop deprecation)

Great work! 🎉

cursor

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

packages/core_systems/pytest_ten/pytest_ten/ten_test.py

cursor · 2025-12-08T02:26:04Z

packages/core_systems/pytest_ten/pytest_ten/ten_test.py

+    ) -> None:
+        assert self._ctx is not None
+
+        await self._ctx.received_video.put(video_frame)


Bug: Audio and video frames not cloned before queueing

In on_audio_frame and on_video_frame, the frames are put directly into the queues without calling .clone(), while on_cmd and on_data correctly clone their messages before queueing. This inconsistency could cause issues if the runtime modifies or deallocates the original frame objects after the callback returns, leading to accessing invalid or corrupted data when the test consumes from the queue.

claude · 2025-12-08T02:50:05Z

PR Review: pytest_ten System Package

Overall Assessment

This is a well-structured and valuable addition to the TEN Framework. The AAA-style testing utilities significantly improve the developer experience for writing Python extension tests. The implementation demonstrates good software engineering practices with comprehensive error handling, clear documentation, and practical examples.

✅ Strengths

1. Code Quality & Architecture

Clean separation of concerns: TenTestContext handles message collection/expectations while _TenTestRunner manages test lifecycle
Comprehensive API: Provides expect/collect/assert methods for all message types (cmd, data, audio_frame, video_frame)
Excellent documentation: Extensive docstrings with clear examples showing basic usage, pytest fixtures, and mocks
Proper async/await patterns: Consistent use of async throughout, avoiding common async pitfalls

2. Testing Features

Flexible timeout handling: Per-operation timeouts with sensible defaults
Smart message filtering: put_back parameter allows non-destructive filtering when order matters
Mock support: Well-designed mock system for commands (including streaming), data, supporting complex test scenarios
Pytest integration: Seamless fixture support via signature inspection (lines 597-611)

3. Error Handling

Proper exception mapping: AssertionError → ErrorCodeGeneric, TimeoutError → ErrorCodeTimeout (lines 430-443)
Full stack traces: Includes traceback in error messages for debugging (lines 431, 436, 442)
Graceful cleanup: Task cancellation in on_deinit prevents resource leaks (lines 414-420)

⚠️ Issues Found

1. CRITICAL BUG: Potential Race Condition in Message Cloning

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:449, 471

async def on_cmd(self, ten_env: AsyncTenEnvTester, cmd: Cmd) -> None:
    await self._ctx.received_cmds.put(cmd.clone())  # Line 449
    cmd_name = cmd.get_name()
    # ... uses original cmd object for mocking

Problem: If .clone() performs a shallow copy and the original cmd object is mutated after queueing, tests could observe inconsistent state. Similarly for data.clone() at line 471.

Recommendation: Verify that clone() performs a deep copy. If it doesn't, the original object should not be used after cloning.

2. MEDIUM: Busy-Wait Pattern in Message Filtering

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:84-98, 132-145

attempt_timeout = min(remaining, 0.1)
data = await asyncio.wait_for(self.received_data.get(), timeout=attempt_timeout)
if name and data.get_name() != name:
    if put_back:
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # Busy loop mitigation
    continue

Problem: When put_back=True, this creates a busy-wait pattern with 100ms sleep + 50ms delay per iteration. For deeply nested queues, this could waste CPU cycles and extend test times.

Recommendation: Consider a more efficient approach:

Use asyncio.Queue.qsize() to check if retrying is worthwhile
Implement a priority queue or separate queues per message name
Document the performance implications of put_back=True

3. MEDIUM: Incomplete Documentation

Location: packages/core_system/pytest_ten/docs/README.en-US.md

Problem: The README contains only placeholder text:

# Default Extension Python Extension
<!-- Brief introduction for the extension -->
## Features
- xxx features

Recommendation: Replace with actual documentation covering:

What pytest_ten is and why developers should use it
Installation instructions (how to add as dev dependency)
Complete usage examples
API reference for TenTestContext and ten_test decorator
Migration guide from legacy test patterns

4. LOW: Missing Type Hints for Mock Callbacks

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:392-396

mock_cmd_response: dict[str, Callable[[Cmd], CmdResult]] | None = None,
mock_cmd_stream_response: (
    dict[str, Callable[[Cmd], AsyncGenerator[CmdResult, None]]] | None
) = None,
mock_data_response: dict[str, Callable[[Data], Data]] | None = None,

Problem: While types are provided, there's no runtime validation that mock functions return the correct types.

Recommendation: Add runtime validation in on_cmd/on_data to check isinstance and provide clear error messages if mocks return wrong types.

5. LOW: Hardcoded Event Loop Access

Location: Multiple locations (e.g., line 74, 122, 200)

end_time = asyncio.get_event_loop().time() + timeout

Problem: get_event_loop() is deprecated in Python 3.10+ and can raise exceptions if called outside async context.

Recommendation: Use asyncio.get_running_loop() instead (requires async context, which is guaranteed here).

6. MINOR: `.vscode/settings.json` Change Concerns

Location: .vscode/settings.json

Problem: The removal of AI agents paths could break type checking for developers working on ai_agents:

-    "./ai_agents/agents/ten_packages/system/ten_ai_base/interface",
-    "./ai_agents/agents/ten_packages/system/ten_ai_base/lib"
+    "./packages/core_system/pytest_ten"

Recommendation: Verify this is intentional. If ai_agents have their own workspace settings, this is fine. Otherwise, consider adding pytest_ten without removing existing paths.

🔍 Security Considerations

✅ No security concerns identified

No user input is directly executed
Proper timeout limits prevent DoS
Error messages don't leak sensitive information

⚡ Performance Considerations

Good:

Async/await throughout prevents blocking
Reasonable default timeout (5s)
Message cloning prevents shared state issues

Watch for:

put_back=True can cause O(n²) behavior for large message queues with filtering
Unbounded queue growth if extension sends messages faster than test consumes them (consider adding maxsize warning)

📊 Test Coverage

✅ Good test examples provided:

Basic AAA-style tests (test_recv_msg_during_starting.py)
Mock usage (test_mock.py)
Lifecycle testing (test_callback_after_stop_test.py)

Suggestions:

Add test for put_back=True edge cases
Add test for timeout edge cases (exactly at timeout, just before/after)
Add test for queue overflow scenarios

🔧 Code Style & Conventions

✅ Follows repository conventions:

Proper copyright headers
Consistent naming (snake_case)
Clear variable names
Comprehensive docstrings

📝 Recommended Actions

Must Fix (Before Merge):

Verify cmd.clone() and data.clone() perform deep copies or document shallow copy implications
Complete the README documentation with real content

Should Fix (High Priority):

Address busy-wait pattern or document performance implications
Replace get_event_loop() with get_running_loop()

Consider (Lower Priority):

Add runtime type validation for mock callbacks
Verify .vscode/settings.json changes don't break ai_agents development
Add edge case tests mentioned above

🎯 Summary

This is a high-quality contribution that will significantly improve the testing experience for TEN Framework Python extensions. The core implementation is solid, with only a few issues to address:

Critical: 1 (race condition verification needed)
Medium: 2 (busy-wait pattern, incomplete docs)
Low/Minor: 3 (type validation, event loop, vscode settings)

Recommendation: ✅ Approve with requested changes - Address the critical race condition verification and complete the documentation before merging.

Great work on this feature! The AAA-style testing will make tests much more readable and maintainable. 🎉

claude · 2025-12-08T02:59:12Z

Pull Request Review: Add pytest_ten System Package

Overview

This PR introduces pytest_ten, a new system package that provides AAA-style (Arrange-Act-Assert) pytest testing utilities for TEN Framework extensions. The implementation is well-structured and follows good testing patterns.

✅ Strengths

1. Clean Architecture

Well-designed TenTestContext class that provides a clean API for testing
Good separation of concerns between TenTestContext (user-facing) and _TenTestRunner (internal)
Excellent decorator pattern for @ten_test that integrates seamlessly with pytest

2. Comprehensive API

Full coverage of message types: commands, data, audio frames, video frames
Flexible timeout handling with context defaults and per-call overrides
Useful utility methods like collect_all_* and assert_no_*
Mock support for cmd responses, streaming responses, and data responses

3. Good Error Handling

Proper error categorization (AssertionError, TimeoutError, generic exceptions)
Clear error messages with context
Appropriate use of TenError types (ErrorCodeGeneric, ErrorCodeTimeout)

4. Integration & CI

Properly integrated into GN build system
Added to CI/CD workflows for packaging and publishing
Version tooling updated to handle new package
IDE configurations updated (VSCode, Pyright)

🐛 Potential Issues & Improvements

Critical Issues

1. Race Condition in Message Filtering (ten_test.py:88-94, 136-142)

The put_back functionality has a potential race condition:

if name and data.get_name() != name:
    if put_back:
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # Small delay to avoid busy loop
    continue

Issue: When multiple tasks are consuming from the same queue, a message put back might be consumed by a different waiter before the original waiter retries.

Recommendation: Consider using a more robust pattern:

Use a deque to reinsert at the front of the queue
Or use multiple queues filtered by name
Document the limitation clearly if keeping current approach

2. Event Loop Access Pattern (ten_test.py:74, 122, etc.)

Using asyncio.get_event_loop() is deprecated in modern Python:

end_time = asyncio.get_event_loop().time() + timeout
remaining = end_time - asyncio.get_event_loop().time()

Recommendation: Use asyncio.get_running_loop() instead:

loop = asyncio.get_running_loop()
end_time = loop.time() + timeout
remaining = end_time - loop.time()

This is more explicit and avoids deprecated API usage.

Design Concerns

3. Message Cloning Strategy

Messages are cloned before queuing:

await self._ctx.received_cmds.put(cmd.clone())  # Line 449
await self._ctx.received_data.put(data.clone())  # Line 471

But audio/video frames are not:

await self._ctx.received_audio.put(audio_frame)  # Line 484
await self._ctx.received_video.put(video_frame)  # Line 491

Question: Is this intentional? If frames are mutable, this could lead to issues. Consider documenting why frames aren't cloned or clone them for consistency.

4. Unbounded Queue Growth

All message queues are unbounded:

self.received_cmds: asyncio.Queue[Cmd] = asyncio.Queue()

Risk: If an extension sends many messages that aren't consumed, memory could grow unbounded.

Recommendation: Consider either:

Adding a reasonable maxsize (e.g., 1000) with overflow handling
Document that users must consume all messages
Add a warning when queue size exceeds threshold

🔍 Code Quality

Good Practices ✅

Excellent docstrings with clear parameter descriptions
Type hints throughout
Consistent naming conventions
Good use of async/await patterns
Proper resource cleanup in on_deinit

Minor Issues

5. Unused Import (ten_test.py:3)

import inspect  # Only used in decorator, could be moved to function scope if desired

Minor: Not a real issue, but inspect is only used in the decorator function at the bottom.

6. Magic Numbers

attempt_timeout = min(remaining, 0.1)  # Line 84, 132
await asyncio.sleep(0.05)  # Line 92, 140

Recommendation: Extract as named constants:

_POLL_INTERVAL = 0.1
_PUT_BACK_DELAY = 0.05

7. Documentation Gaps

The README files (packages/core_system/pytest_ten/docs/README.*.md) are placeholder templates and don't document the actual package. This should be updated before release.

🔒 Security

No security concerns identified.

No user input processing without validation
No file system operations with user-controlled paths
No credential handling
Appropriate sandboxing in test environment

⚡ Performance

Concerns

8. Busy-Wait Pattern in expect_ Methods*

The implementation polls every 0.1 seconds:

while True:
    remaining = end_time - asyncio.get_event_loop().time()
    if remaining <= 0:
        raise TimeoutError(...)
    try:
        attempt_timeout = min(remaining, 0.1)
        data = await asyncio.wait_for(self.received_data.get(), timeout=attempt_timeout)

Observation: This creates unnecessary wake-ups every 100ms. While not critical for tests, it's not ideal.

Alternative: Could use a single asyncio.wait_for with the full timeout:

try:
    data = await asyncio.wait_for(self.received_data.get(), timeout=timeout)
    if name and data.get_name() != name:
        # handle mismatch
except asyncio.TimeoutError:
    raise TimeoutError(...)

But current approach allows for name filtering, so trade-off may be acceptable for tests.

🧪 Test Coverage

Coverage is good:

Example tests in test_mock.py demonstrate mocking
test_recv_msg_during_starting.py shows basic usage
Tests compare old ExtensionTester approach with new pytest_ten approach
Both sync and async patterns tested

Suggestion: Consider adding tests for:

Timeout scenarios
put_back=True behavior
collect_all_* methods
Negative assertions (assert_no_*)
Stream mock responses

📋 Build System & CI

Well integrated:

✅ GN build files properly configured
✅ Added to workflow packaging arrays
✅ Publish target created
✅ Version tooling updated for manifest.json and dev_dependencies
✅ PYTHONPATH configurations updated

One observation:
The version tooling change in tools/version/common.py:172-177 adds support for dev_dependencies, which is good and necessary for this PR.

📝 Recommendations Summary

Must Fix Before Merge

Replace asyncio.get_event_loop() with asyncio.get_running_loop() (Python 3.10+ best practice)
Document or fix the audio/video frame cloning inconsistency
Update README documentation to replace placeholder content

Should Consider

Document race condition risk with put_back=True or improve implementation
Add queue size limits or document unbounded growth risk
Extract magic numbers to named constants
Consider performance optimization for polling in expect_* methods (lower priority for test code)

Nice to Have

Add more comprehensive test coverage for edge cases
Add examples in documentation showing common test patterns

🎯 Overall Assessment

This is a high-quality PR that adds valuable testing infrastructure to the TEN Framework.

The implementation is clean, well-documented, and provides an excellent developer experience. The few issues identified are minor and don't block the core functionality.

Recommendation: Approve with minor changes (address the deprecated API usage and documentation at minimum)

Great work! 🚀

claude · 2025-12-08T03:00:05Z

Code Review: pytest_ten System Package

Summary

This PR introduces a well-designed pytest integration package (pytest_ten) that provides AAA-style testing utilities for TEN extensions. The implementation is comprehensive, covering the new package code, build integration, CI/CD updates, versioning tooling, and example test migrations. Overall, this is a high-quality contribution with solid architecture and good practices.

✅ Strengths

1. Excellent API Design

The TenTestContext class provides a clean, intuitive API for test authors
AAA-style testing (Arrange-Act-Assert) is a great pattern for readability
Comprehensive message handling: commands, data, audio/video frames
Well-thought-out filtering and collection methods (expect_*, collect_all_*, assert_no_*)

2. Strong Documentation

Extensive docstrings with clear parameter descriptions and examples
Multiple usage examples in the decorator docstring (basic, fixtures, mocks)
Good explanation of the pytest fixture integration pattern

3. Robust Error Handling

Proper exception chaining with from e syntax
Distinct error codes for different failure types (assertion, timeout, generic)
Comprehensive traceback inclusion in error messages

4. Complete Integration

Proper GN build system integration with package publishing
CI/CD pipeline updates for packaging and testing
Version tooling correctly updated to handle the new package
Editor configuration updates (VSCode, Pyright)

5. Good Testing Patterns

Mock support for commands (both single-response and streaming)
Mock support for data responses
Pytest fixture support through signature inspection
Real test examples demonstrate usage

🔍 Issues Found

Critical Issues

1. Race Condition in Message Filtering (packages/core_system/pytest_ten/pytest_ten/ten_test.py:89-92, 137-140)

The put_back=True logic has a potential race condition:

if put_back:
    # Put back and wait before retrying
    await self.received_data.put(data)
    await asyncio.sleep(0.05)  # Small delay to avoid busy loop

Problem: If messages arrive while waiting, they could be reordered. The queue doesn't preserve order when putting items back.

Impact: Tests using put_back=True could see messages in unexpected order, causing flakiness.

Recommendation:

Either document this limitation clearly in the docstring
Or implement a priority queue that preserves insertion order
Or remove put_back parameter and only support drop behavior (simpler, clearer semantics)

2. Deprecated API Usage (packages/core_system/pytest_ten/pytest_ten/ten_test.py:74, 122, 200, etc.)

Multiple uses of asyncio.get_event_loop().time():

end_time = asyncio.get_event_loop().time() + timeout

Problem: get_event_loop() is soft-deprecated in Python 3.10+ and may not work correctly in all contexts.

Impact: Could cause warnings or failures in future Python versions.

Recommendation: Replace with asyncio.get_running_loop().time() (safe in async context):

end_time = asyncio.get_running_loop().time() + timeout

High Priority Issues

3. Missing Clone on Audio/Video Frames (packages/core_system/pytest_ten/pytest_ten/ten_test.py:484, 491)

await self._ctx.received_audio.put(audio_frame)  # Not cloned\!
await self._ctx.received_video.put(video_frame)  # Not cloned\!

Problem: Commands and data are cloned (line 449, 471), but audio/video frames are not. This could cause memory issues if frames are mutated.

Recommendation: Either clone frames or document why they shouldn't be cloned:

await self._ctx.received_audio.put(audio_frame.clone())  # If clone() exists

4. Unused Import (packages/core_system/pytest_ten/pytest_ten/ten_test.py:3)

import inspect
import traceback

Problem: inspect is imported but only used at line 598, while traceback is used. Both are needed but the order suggests potential cleanup.

Recommendation: Verify both are needed, remove if not.

Medium Priority Issues

5. Task Cancellation Without Cleanup (packages/core_system/pytest_ten/pytest_ten/ten_test.py:414-420)

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass

Problem: If the test is cancelled mid-operation (e.g., during send_cmd), resources might not be properly cleaned up.

Recommendation: Consider adding cleanup logic or documenting expected behavior on cancellation.

6. Hardcoded Timeout Values (packages/core_system/pytest_ten/pytest_ten/ten_test.py:84)

attempt_timeout = min(remaining, 0.1)
await asyncio.sleep(0.05)

Problem: Magic numbers make the code harder to tune for different environments.

Recommendation: Extract as class constants:

_QUEUE_POLL_INTERVAL = 0.1  # seconds
_PUT_BACK_DELAY = 0.05  # seconds

7. Version Mismatch Risk (packages/core_system/pytest_ten/pyproject.toml:8 vs manifest.json:4)

pyproject.toml version: 0.1.0
manifest.json version: 0.11.41

Problem: These versions don't match, which could cause confusion about the package version.

Recommendation: Align versions or document why they differ. Typically, manifest.json should be the source of truth for TEN packages.

8. Missing Type Hints for Queues (packages/core_system/pytest_ten/pytest_ten/ten_test.py:31-34)

The generic types on Queue declarations are good, but Python 3.10+ should use queue.Queue[T] from the standard library.

Current:

self.received_cmds: asyncio.Queue[Cmd] = asyncio.Queue()

Recommendation: This is actually correct for asyncio.Queue. No change needed, but ensure type checking passes.

Low Priority / Nitpicks

9. Inconsistent Assertion Messages (packages/core_system/pytest_ten/pytest_ten/ten_test.py:606)

assert err is None, f"Test failed: {err.error_message()}"

Problem: This could raise AttributeError if err is not None but lacks error_message().

Recommendation: Use defensive access:

assert err is None, f"Test failed: {err.error_message() if err else 'unknown'}"

10. README Templates Not Updated (packages/core_system/pytest_ten/docs/README.*.md)

The README files contain placeholder text like "Default Extension Python Extension" and "xxx features".

Recommendation: Update with actual pytest_ten documentation before merging.

11. Missing uv.lock in .gitignore? (packages/core_system/pytest_ten/uv.lock)

A 147-line uv.lock file is committed, which is unusual for a library.

Recommendation: Verify if this should be in version control or .gitignore'd.

🎯 Best Practices & Quality

Good Practices Observed:

✅ Proper use of async/await patterns
✅ Context manager pattern with proper cleanup
✅ Clear separation of concerns (Context, Runner, Decorator)
✅ Comprehensive test coverage with multiple test files
✅ Proper package metadata (LICENSE, manifest.json, pyproject.toml)
✅ GN build system integration follows existing patterns
✅ CI/CD updates are complete and consistent

Minor Suggestions:

Consider adding a pytest.ini or configuration example for test discovery
Add a migration guide from old-style ExtensionTester to new @ten_test decorator
Consider exposing timeout configuration as a class attribute on TenTestContext

🔒 Security

No security concerns identified. The package:

Doesn't handle sensitive data directly
Doesn't expose network services
Uses safe async patterns
Properly handles exceptions and errors

⚡ Performance

Performance considerations:

The 0.1s polling interval in expect_* methods is reasonable for testing
Queue operations are efficient
No apparent memory leaks (assuming frames are handled correctly)
Async patterns are properly used for non-blocking operations

Suggestion: For high-throughput tests, consider making the polling interval configurable.

🧪 Test Coverage

Excellent test coverage:

Basic message sending/receiving (test_recv_msg_during_starting.py)
Mock command responses (test_mock.py)
Callback timing (test_callback_after_stop_test.py)
Multiple test files demonstrate different usage patterns
Old-style tests kept for comparison

Minor gap: No explicit tests for collect_all_* or assert_no_* methods.

📋 Summary

Category	Rating	Notes
Code Quality	⭐⭐⭐⭐ (4/5)	Well-structured, minor issues
Documentation	⭐⭐⭐⭐⭐ (5/5)	Excellent docstrings and examples
Testing	⭐⭐⭐⭐ (4/5)	Good coverage, could add more edge cases
Integration	⭐⭐⭐⭐⭐ (5/5)	Complete CI/CD and build updates
Security	⭐⭐⭐⭐⭐ (5/5)	No concerns

🚀 Recommendations for Merge

Before merging:

✅ Must fix: Replace asyncio.get_event_loop() with asyncio.get_running_loop()
⚠️ Should fix: Resolve put_back race condition or document limitation
⚠️ Should fix: Update README documentation files
💡 Consider: Clone audio/video frames or document why not needed

After these fixes, this PR is ready to merge. The new pytest_ten package is a valuable addition that will significantly improve the testing experience for TEN extensions.

Great work! This is a well-architected addition to the framework. 🎉

claude · 2025-12-08T07:37:48Z

Pull Request Review: pytest_ten System Package

Summary

This PR introduces a new Python system package pytest_ten that provides AAA-style testing utilities for TEN extensions. The implementation is well-structured and adds significant value to the testing infrastructure.

Critical Issues Found

1. Missing clone() for AudioFrame/VideoFrame (ten_test.py:502, 509)

Unlike on_cmd and on_data which clone the messages, audio/video frames are not cloned before being put in the queue. This could lead to data corruption if the frame is modified elsewhere.

2. README Documentation is Placeholder

The docs/README.en-US.md contains only placeholder text copied from a template and should be updated with actual pytest_ten documentation.

Detailed Review

See full review in next comment...

claude · 2025-12-08T07:38:05Z

Strengths

Code Quality

Clean Architecture: The separation between TenTestContext (user-facing API) and _TenTestRunner (internal implementation) is well-designed
Comprehensive API: The context provides a rich set of methods for both sending and asserting on messages
Good Documentation: Extensive docstrings with clear examples in the decorator
Proper Error Handling: Exception handling in _run_test() properly converts Python exceptions to TenError types

Design Patterns

Decorator Pattern: The ten_test decorator elegantly wraps test functions and handles pytest fixture integration
Queue-based Collection: Using asyncio queues to collect messages is a solid async pattern
Mock Support: Built-in support for mocking commands, streaming commands, and data responses is excellent

Integration

CI/CD Integration: Properly wired into GitHub workflows for publishing
Build System Integration: Correctly integrated into BUILD.gn with proper dependencies
Version Management: Included in version bump scripts

claude · 2025-12-08T07:38:22Z

All Issues Found

Critical Issues

1. Missing clone() for AudioFrame/VideoFrame

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:502, 509
Issue: Audio/video frames are not cloned before being put in the queue
Impact: Data corruption if frames are modified elsewhere, potential resource leaks
Fix: Add .clone() call like the on_cmd and on_data handlers do

Medium Priority

2. Type Annotation Inconsistency

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:408
Issue: test_func should be Callable[[TenTestContext], Awaitable[Any]] not Callable[[TenTestContext], Any]

3. README is Placeholder

Location: packages/core_system/pytest_ten/docs/README.en-US.md
Issue: Contains only template text, needs actual pytest_ten documentation

Low Priority

4. Busy-Wait Pattern

Location: packages/core_system/pytest_ten/pytest_ten/ten_test.py:86, 136
Issue: Uses 0.1s timeouts with 0.05s sleeps when putting back messages
Impact: CPU usage, could be optimized

5. Missing Type Hints

Location: core/src/ten_runtime/binding/python/interface/ten_runtime/async_test.py
Issue: Refactored callback functions lack type annotations

claude · 2025-12-08T07:38:37Z

Test Coverage Analysis

Good

Example tests demonstrate basic functionality (test_mock.py)
Integration with existing test infrastructure
Mock support is well-tested
Shows both old and new testing approaches side-by-side

Missing

No unit tests for the pytest_ten package itself
No tests for error conditions (timeouts, assertion failures)
No tests for put_back functionality
No tests for collect_all_* methods
No tests for assert_no_* methods

Recommendation: Add a test suite under packages/core_system/pytest_ten/tests/ to test the testing framework itself.

Security & Performance

Security

No security concerns identified
No sensitive data handling
Proper error handling prevents information leakage

Performance

Positive:

Efficient queue-based message collection
Proper use of asyncio primitives
Minimal overhead for test harness

Concerns:

Busy-wait pattern in expect_* methods could be optimized
Queue size of 10 for streaming commands may be insufficient for high-throughput tests (consider making configurable)

claude · 2025-12-08T07:38:55Z

Recommendations

Must Fix Before Merge

Clone audio/video frames in on_audio_frame and on_video_frame (ten_test.py:502, 509)
- Add: await self._ctx.received_audio.put(audio_frame.clone())
- Add: await self._ctx.received_video.put(video_frame.clone())
Update README with actual documentation explaining what pytest_ten is, how to use it, and migration guide

Should Fix Soon After Merge

Add comprehensive tests for pytest_ten itself
Fix type annotation inconsistencies
Optimize the busy-wait pattern in expect methods
Add more example tests demonstrating advanced features (streaming, put_back, assert_no_*)

Nice to Have

Make queue sizes configurable
Add helper methods for common patterns like send_and_expect_data
Add performance benchmarks
Consider adding support for property validation helpers

Overall Assessment

This is a well-designed and valuable addition to the TEN Framework testing infrastructure. The AAA-style approach with TenTestContext is much more intuitive than the callback-based approach. The implementation is solid with only a few critical issues that need addressing.

Recommendation: Approve with minor changes required (fix the cloning issue and documentation).

Score: 8.5/10

The main detractors are:

Missing clone() for audio/video frames (critical bug)
Placeholder documentation
Lack of tests for the testing framework itself

Once these issues are addressed, this will be an excellent addition to the codebase. The refactoring to named callbacks in async_test.py is a nice improvement for debuggability.

cursor · 2025-12-08T09:15:08Z

packages/core_systems/pytest_ten/pytest_ten/ten_test.py

+
+    async def send_video_frame(self, frame: VideoFrame) -> None:
+        """Send video frame to extension"""
+        await self.ten_env.send_video_frame(frame)


Bug: Send methods silently discard error return values

The send_data, send_audio_frame, and send_video_frame methods in TenTestContext declare a return type of None but the underlying AsyncTenEnvTester methods return TenError | None. The returned error values are being silently discarded, which could cause tests to appear to pass when send operations actually fail. These methods need to propagate the error return value to callers.

claude · 2025-12-08T09:15:17Z

Pull Request Review: pytest_ten System Package

I've completed a comprehensive review of this PR. Overall, this is a well-structured addition that introduces AAA-style testing capabilities to the TEN Framework. Here's my detailed feedback:

✅ Strengths

1. Excellent Code Quality

Clean, well-documented code with comprehensive docstrings
Proper type hints throughout (Python 3.10+ syntax)
Good separation of concerns between TenTestContext and _TenTestRunner
Follows AAA (Arrange-Act-Assert) testing pattern effectively

2. Comprehensive API Design

expect_* methods for receiving messages with timeout handling
collect_all_* methods for gathering multiple messages
assert_no_* methods for negative assertions
Support for pytest fixtures integration
Mock support for commands and data responses

3. Robust Error Handling

Proper exception handling with custom error codes (AssertionError → ErrorCodeGeneric, TimeoutError → ErrorCodeTimeout)
Good traceback preservation for debugging
Task cancellation handled correctly in on_deinit

4. Good Refactoring in async_test.py

Named callback functions instead of lambdas (lines 70-112 in async_test.py)
Improves code readability and debuggability

⚠️ Issues & Concerns

1. Critical: Race Condition in Message Filtering

Location: ten_test.py:91-96, 141-146

if name and data.get_name() != name:
    if put_back:
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # ⚠️ Arbitrary delay
    continue

Issues:

The put_back mechanism with asyncio.sleep(0.05) is a race condition waiting to happen
Multiple concurrent expect_* calls could cause message reordering
The 0.05s delay is arbitrary and not based on any synchronization guarantee
Could lead to messages being lost or delivered out of order

Recommendation:

Consider using separate queues per message name/type
Or implement a proper message routing system with locks
Document that put_back=True should not be used with concurrent expect_* calls

2. Bug: Missing `put_back` Support for Audio/Video Frames

Location: ten_test.py:153-197

The expect_audio_frame and expect_video_frame methods don't support put_back parameter, but they also can't filter by name/type. This inconsistency could be confusing.

Recommendation: Either add filtering support or document why it's not needed.

3. Performance: Clone Every Message

Location: ten_test.py:467, 489

await self._ctx.received_cmds.put(cmd.clone())
await self._ctx.received_data.put(data.clone())

Issue: Cloning every message might impact performance for high-throughput scenarios (especially video frames).

Recommendation:

Document the cloning behavior
Consider making cloning optional if messages are read-only in tests

4. Security: Unbounded Queue Growth

Location: ten_test.py:31-34

self.received_cmds: asyncio.Queue[Cmd] = asyncio.Queue()
self.received_data: asyncio.Queue[Data] = asyncio.Queue()

Issue: Unbounded queues could lead to memory exhaustion if messages arrive faster than they're consumed.

Recommendation: Add maxsize limit or document that tests should always consume messages.

5. Documentation: README is Placeholder

Location: packages/core_system/pytest_ten/docs/README.en-US.md

The README files are still using placeholder text ("Default Extension Python Extension", "xxx features").

Recommendation: Update with actual documentation about:

What pytest_ten does
Installation/setup instructions
Usage examples
API reference

6. Minor: VSCode Settings Removed AI Agents Paths

Location: .vscode/settings.json:36-37

-    "./ai_agents/agents/ten_packages/system/ten_ai_base/interface",
-    "./ai_agents/agents/ten_packages/system/ten_ai_base/lib"

Question: Was this intentional? Are these paths no longer needed, or should they be kept alongside the new pytest_ten path?

🔍 Design Questions

1. Test Function Signature Magic

The decorator extracts pytest fixtures from the function signature (lines 608-610). This is clever but:

May be confusing for developers unfamiliar with the pattern
First parameter ctx must always be TenTestContext (enforced by convention, not type system)

Suggestion: Add runtime validation that first parameter is named 'ctx' with helpful error message.

2. Mock Response Limitations

Mock responses are one-shot - they return a single result. For streaming scenarios, mock_cmd_stream_response helps, but there's no mock for audio/video frames.

Question: Is mocking audio/video frames out of scope, or should it be added?

🧪 Test Coverage

The updated tests demonstrate good usage:

test_mock.py: Shows mock command responses ✅
test_recv_msg_during_starting.py: Shows basic expect pattern ✅
test_callback_after_stop_test.py: Shows complex lifecycle handling ✅

Suggestion: Add tests for:

Multiple concurrent expect_* calls
put_back=True behavior
Timeout edge cases
collect_all_* methods

🏗️ Build & CI

Positive:

Proper GN build integration ✅
CI pipeline updated to publish pytest_ten ✅
Version tooling updated to include pytest_ten ✅
dev_dependencies support added ✅

Note:

The uv.lock file was added but uv.lock files can be large (147 additions). Consider if this should be in the repo or generated.

🔒 Security Assessment

No critical security issues found.

Minor considerations:

Unbounded queue memory usage (mentioned above)
No input validation on timeout values (negative/inf values accepted)

📊 Performance Considerations

Message cloning overhead - acceptable for testing but document it
0.1s polling intervals in expect methods - reasonable for tests
Queue operations - efficient enough for testing workloads

✅ Best Practices Compliance

✅ Apache 2.0 license headers present
✅ Type hints used consistently
✅ Follows async/await patterns correctly
✅ Good error messages
✅ Proper task cleanup in on_deinit

📝 Recommendations Summary

Must Fix:

Document the put_back race condition and when it's safe to use
Update README files with actual documentation
Add maxsize to queues or document unbounded growth risk

Should Fix:

Validate first parameter name in decorator
Consider separate queues per message name to avoid race conditions
Add more comprehensive test coverage

Nice to Have:

Audio/video frame mocking support
Timeout value validation
Make message cloning optional for performance

🎯 Overall Assessment

This is a high-quality addition to the TEN Framework that will significantly improve the testing experience for Python extensions. The code is well-written, properly integrated, and follows good practices.

Recommendation: ✅ Approve with minor revisions

The identified issues are mostly documentation gaps and edge case handling that can be addressed in follow-up PRs if needed. The core functionality is solid and ready for use.

Great work! 🚀

claude · 2025-12-09T02:38:55Z

Pull Request Review: pytest_ten System Package

Overview

This PR introduces a new Python system package pytest_ten that provides AAA-style (Arrange-Act-Assert) pytest integration for TEN Framework testing. The implementation is well-structured and follows good software engineering practices.

✅ Strengths

1. Architecture & Design

Clean Separation of Concerns: TenTestContext clearly separates test orchestration from the internal _TenTestRunner implementation
Excellent API Design: The AAA-style testing API is intuitive and developer-friendly
Comprehensive Mock Support: Supports mocking commands, streaming commands, and data responses
Async-First Design: Properly leverages Python's asyncio for non-blocking test execution

2. Code Quality

Excellent Documentation: Comprehensive docstrings with clear examples throughout
Type Hints: Proper use of type annotations (Python 3.10+ union syntax)
Error Handling: Well-structured exception handling with proper TenError reporting
Consistent Naming: Clear and consistent naming conventions

3. Testing Infrastructure

Good Integration: Updates CI/CD workflows to include the new package
Example Tests: Provides clear examples of how to use the new testing framework
Backward Compatibility: Maintains existing test infrastructure while adding new capabilities

⚠️ Issues & Concerns

1. Critical: Race Condition in Message Filtering (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:54-102)

The expect_data() and expect_cmd() methods have potential race conditions when using put_back=True:

if name and data.get_name() != name:
    if put_back:
        # Put back and wait before retrying
        await self.received_data.put(data)
        await asyncio.sleep(0.05)  # Small delay to avoid busy loop

Issue: The 50ms sleep is arbitrary and may not prevent race conditions in concurrent tests. Messages could be reordered or lost.

Recommendation: Consider using a priority queue or separate queues per message name, or document that put_back=True should not be used in concurrent scenarios.

2. Medium: Unused Import (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:3)

import inspect
import traceback

The inspect module is imported but only used once. While not harmful, it's good to keep imports minimal.

3. Medium: Inconsistent Timeout Handling (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:153-197)

Audio/video frame expectation uses direct asyncio.wait_for(), while cmd/data use custom polling loops:

# Audio/Video approach
return await asyncio.wait_for(self.received_audio.get(), timeout=timeout)

# Cmd/Data approach  
while True:
    remaining = end_time - asyncio.get_event_loop().time()
    attempt_timeout = min(remaining, 0.1)
    data = await asyncio.wait_for(...)

Recommendation: Use consistent approach across all message types. The polling approach with 0.1s chunks seems unnecessary for audio/video frames.

4. Medium: Queue Size Limitations (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:31-34)

Unbounded queues could lead to memory issues in long-running or high-throughput tests:

self.received_cmds: asyncio.Queue[Cmd] = asyncio.Queue()
self.received_data: asyncio.Queue[Data] = asyncio.Queue()

Recommendation: Consider adding configurable max queue sizes with proper backpressure handling.

5. Low: Missing Clone for Audio/Video (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:497-509)

Commands and data are cloned before queuing, but audio/video frames are not:

async def on_data(self, ten_env: AsyncTenEnvTester, data: Data) -> None:
    await self._ctx.received_data.put(data.clone())  # ✅ Cloned

async def on_audio_frame(self, ten_env: AsyncTenEnvTester, audio_frame: AudioFrame) -> None:
    await self._ctx.received_audio.put(audio_frame)  # ❌ Not cloned

Recommendation: Verify whether AudioFrame/VideoFrame need cloning for thread-safety. If they do, add .clone() calls.

6. Low: Task Cleanup in on_deinit (packages/core_systems/pytest_ten/pytest_ten/ten_test.py:432-438)

Task cancellation silently swallows CancelledError:

async def on_deinit(self, ten_env: AsyncTenEnvTester) -> None:
    if self._test_task:
        self._test_task.cancel()
        try:
            await self._test_task
        except asyncio.CancelledError:
            pass

Recommendation: Consider logging when task is cancelled, especially if it's cancelled prematurely (before test completion).

7. Documentation: README Files Need Content (packages/core_systems/pytest_ten/docs/)

The README files contain template placeholders:

# Default Extension Python Extension
<!-- Brief introduction for the extension -->
- xxx features

Recommendation: Replace with actual documentation explaining:

What pytest_ten is and why to use it
Installation instructions
API reference
Migration guide from old testing approach

🔒 Security Considerations

✅ No Security Issues Found

No unsafe deserialization
No command injection vulnerabilities
No path traversal issues
Proper error message handling (doesn't leak sensitive info)

⚡ Performance Considerations

1. Efficient Message Handling

The queue-based approach is performant for typical test scenarios.

2. Potential Concern: Polling Loop Overhead

The 0.1s polling chunks in expect_cmd/expect_data add unnecessary complexity. For most cases, direct queue waiting would be more efficient.

3. Memory Management

Unbounded queues could grow large in stress tests. Consider:

Adding queue size limits
Implementing automatic cleanup for old messages
Adding metrics/warnings when queues grow large

🧪 Test Coverage

✅ Good Coverage

Example tests demonstrate core functionality
Tests cover both old and new testing approaches
Mock functionality is tested

📝 Suggestions

Add tests for edge cases:
- Timeout behavior when no messages arrive
- Behavior when queues are full (if size limits added)
- Concurrent message handling
- Multiple expect calls with filtering
Add tests for error conditions:
- Test failures with proper error messages
- Exception handling in test functions
- Cleanup on test failure
Consider adding integration tests that:
- Test with real extensions (not just mocks)
- Test performance under load
- Verify memory cleanup

📋 Other Observations

1. Version Synchronization ✅

Good: Version tooling properly updated to include pytest_ten

2. CI Integration ✅

Properly integrated into packaging and publishing workflows

3. Python Version ✅

Requires Python 3.10+, which is reasonable for modern type hints

4. Minor: uv.lock File

The PR includes uv.lock with 147 additions. Consider whether lock files should be committed or gitignored.

🎯 Recommendations Summary

Must Fix:

Address the race condition in put_back logic or document its limitations
Complete the README documentation

Should Fix:

Make timeout handling consistent across all message types
Verify and fix audio/video frame cloning if needed
Add queue size limits or document unbounded queue behavior

Nice to Have:

Remove unused imports
Add logging for task cancellation
Expand test coverage for edge cases

✨ Conclusion

This is a well-designed and valuable addition to the TEN Framework. The pytest_ten package provides a much cleaner and more intuitive testing API compared to the callback-based approach. The code quality is high, with good documentation and proper async handling.

The issues identified are relatively minor and mostly involve edge cases and documentation. The critical race condition should be addressed, but the core design is solid.

Overall Assessment: Approve with minor changes requested ✅

Great work on this feature! The AAA-style testing will significantly improve the developer experience.

claude · 2025-12-09T03:06:12Z

Pull Request Review: pytest_ten System Package

This PR introduces a well-designed pytest integration system for TEN Framework that enables AAA (Arrange-Act-Assert) style testing. The implementation is solid overall with comprehensive features and good integration with the existing codebase.

Strengths

Architecture and Design

Clean separation: Moving pytest_ten to a separate system package is the right architectural choice
AAA pattern: The TenTestContext API provides an intuitive testing interface
Comprehensive API: Well-designed methods including expect, collect_all, and assert_no for various testing scenarios
Flexible mocking: Support for mocking commands (single + streaming) and data responses

Code Quality

Good documentation: Extensive docstrings and inline comments throughout
Error handling: Proper exception handling with specific error types
Type hints: Consistent use of type annotations (Python 3.10+ union syntax)
Async safety: Named callback functions in async_test.py prevent lambda closure issues

Integration

CI/CD integration: Properly wired into Ubuntu workflow for packaging/publishing
Build system: Correct GN build configuration with dependencies
Versioning: Integrated into version update scripts with dev_dependencies support
Testing: Real-world test examples demonstrate the framework works

Issues and Suggestions

1. Potential Race Condition in put_back Logic

Location: packages/core_systems/pytest_ten/pytest_ten/ten_test.py:91-96

The put_back feature has a subtle race condition. If multiple messages arrive rapidly, the queue ordering may be disrupted when putting messages back.

Recommendation: Add a warning in documentation about put_back=True limitations with high-frequency message streams.

2. Missing Return Type Handling

Location: packages/core_systems/pytest_ten/pytest_ten/ten_test.py:42-43

The send_data and similar methods return None but underlying ten_env.send_data() might return TenError. Consider checking return value and raising exception on error, or changing return type to TenError | None for consistency with send_cmd().

3. Inconsistent Timeout Behavior

The expect methods use polling with 0.1s attempts, while expect_audio_frame/video_frame use direct waits. Recommend standardizing timeout behavior across all expect methods for consistency.

4. Event Loop Deprecation

Multiple places use asyncio.get_event_loop() which is soft-deprecated in Python 3.10+. Recommend using asyncio.get_running_loop() instead.

5. Pytest Fixture Support - Incomplete Implementation

Location: packages/core_systems/pytest_ten/pytest_ten/ten_test.py:418-421

The fixture storage (self._pytest_fixtures) exists but is never populated! The decorator does not extract pytest fixtures. Either implement fixture extraction or remove the incomplete code to avoid confusion.

6. Documentation: Missing Dropped Messages Warning

The put_back=False default silently drops unmatched messages, which could lead to confusing test failures. Add prominent warning in docstrings about this behavior.

7. Directory Name Inconsistency

The package is in packages/core_systems/ but BUILD.gn references core_system (singular). Check if this is intentional.

Security Considerations

No security issues identified. Test-only code with appropriate isolation.

Test Coverage

Good: New test file test_mock.py demonstrates both old and new testing styles.

Suggestions: Add tests for edge cases (timeout scenarios, put_back=True with multiple messages, collect_all with empty queues, assert_no methods, streaming mock responses).

Performance Considerations

Async/await used correctly throughout
0.1s polling interval in expect methods adds latency - consider making configurable
Queue cloning on every message - ensure these are lightweight operations

Summary

Overall Assessment: Approve with minor fixes recommended

This is a well-executed feature that significantly improves testing ergonomics for TEN Framework. The core implementation is solid with good design patterns.

Priority Fixes:

Fix or remove incomplete pytest fixture support (Issue 5)
Document put_back race condition limitations (Issue 1)
Replace deprecated get_event_loop() calls (Issue 4)
Add edge case test coverage

Nice-to-Have:

Standardize timeout behavior (Issue 3)
Add dropped messages warning to docs (Issue 6)

The PR is ready to merge after addressing the pytest fixture issue - it is currently non-functional code that could confuse users.

Great work on this feature!

* fix: remove the version lock on ten_ai_base (#1818) * fix: remove the version lock on ten_ai_base * fix: revert --------- Co-authored-by: Ethan Zhang <[email protected]> * feat: add a python system package `pytest_ten` supporting AAA-style tests (#1838) * feat: add a python test wrapper supporting AAA-style tests * fix: use relative import * feat: move test_wrapper from ten_runtime into core_system package * test: add example tests using pytest_ten * chore: refine code * feat: update workflow to upload pytest_ten system package * fix: refine code * fix: test case on windows --------- Co-authored-by: xxxxl_sun <[email protected]> Co-authored-by: sunxilin <[email protected]> * fix: elevenlabs asr language (#1842) Co-authored-by: liaochenliang <[email protected]> * feat: support tman install app (#1845) Co-authored-by: Hu Yueh-Wei <[email protected]> * refactor(property): rename max_tokens to max_completion_tokens for clarity (#1848) Update property.json files to use more descriptive parameter name max_completion_tokens instead of max_tokens. Also includes minor text and voice setting adjustments. * feat: add gradium tts (#1843) * feat: add gradium tts * style(gradium_tts_python): improve code formatting and line wrapping * refactor: simplify error logging and config handling * fix: update licence when uploading to ppa (#1851) * feat: winget (#1844) * feat: winget (dry run when push) * fix: remove potential code injection * fix: variable format * fix: build command args * chore: test production run, trigger PR to winget-pkgs * fix: higher rate limit * Revert "chore: test production run, trigger PR to winget-pkgs" This reverts commit 4d5d6e0. * chore: modify PackageIdentifier * chore: refine codes * fix: trigger * chore: refine codes * chore: refine codes * chore: refine template * fix: license type * fix: refine codes --------- Co-authored-by: Nie Zhihe <[email protected]> Co-authored-by: Hu Yueh-Wei <[email protected]> * fix: update minimax api endpoint from .com to .io (#1852) * refactor: simplify build command by removing redundant clean task (#1849) Co-authored-by: Ethan Zhang <[email protected]> * fix(voice-assistant-live2d): update voice settings and api endpoint * fix: update minimax websocket url from minimaxi.io to minimax.io (#1854) --------- Co-authored-by: xxxxl_sun <[email protected]> Co-authored-by: Ethan Zhang <[email protected]> Co-authored-by: Jun <[email protected]> Co-authored-by: sunxilin <[email protected]> Co-authored-by: liaochenliang <[email protected]> Co-authored-by: liaochenliang <[email protected]> Co-authored-by: Hu Yueh-Wei <[email protected]> Co-authored-by: Nie Zhihe <[email protected]> Co-authored-by: Nie Zhihe <[email protected]>

anexpn added 4 commits December 2, 2025 00:19

feat: add a python test wrapper supporting AAA-style tests

b6addbc

fix: use relative import

25f3a90

feat: move test_wrapper from ten_runtime into core_system package

f83cb49

test: add example tests using pytest_ten

237ad0d

anexpn requested a review from halajohn as a code owner December 7, 2025 16:39

anexpn changed the title ~~feat: add a python system package pytest_ten supporting AAA-style tests~~ feat: add a python system package pytest_ten supporting AAA-style tests Dec 7, 2025

Merge branch 'main' into feat/aaa_test

00688f3

cursor bot reviewed Dec 8, 2025

View reviewed changes

chore: refine code

d8f9757

sunxilin and others added 2 commits December 8, 2025 10:56

feat: update workflow to upload pytest_ten system package

42abfd8

Merge branch 'main' into feat/aaa_test

79f1542

fix: refine code

84f4f72

fix: test case on windows

bc3f8a8

cursor bot reviewed Dec 8, 2025

View reviewed changes

fix: refine codes

debd032

fix: refine codes

9884229

halajohn merged commit 29c4d0c into main Dec 9, 2025
10 of 16 checks passed

halajohn deleted the feat/aaa_test branch December 9, 2025 03:01

feat: add a python system package pytest_ten supporting AAA-style tests #1838

feat: add a python system package pytest_ten supporting AAA-style tests #1838

Uh oh!

Conversation

anexpn commented Dec 7, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Dec 7, 2025

Pull Request Review

✅ Strengths

Code Quality

Testing

⚠️ Issues & Recommendations

🔴 High Priority

1. Race Condition in Message Filtering (packages/core_system/pytest_ten/pytest_ten/ten_test.py:74-98)

2. Missing Resource Cleanup (packages/core_system/pytest_ten/pytest_ten/ten_test.py:380-491)

3. Incomplete README Documentation (packages/core_system/pytest_ten/docs/README.en-US.md)

🟡 Medium Priority

4. Hardcoded Timeout Values (packages/core_system/pytest_ten/pytest_ten/ten_test.py:84, 132)

5. Direct Access to Private Member (packages/core_system/pytest_ten/pytest_ten/ten_test.py:604)

6. Potential Memory Issue with Clone (packages/core_system/pytest_ten/pytest_ten/ten_test.py:449, 471)

7. Mock Response Priority Not Documented (packages/core_system/pytest_ten/pytest_ten/ten_test.py:452-466)

🟢 Low Priority

8. Inconsistent Error Messages (packages/core_system/pytest_ten/pytest_ten/ten_test.py)

9. Missing Type Hints for sync wrapper (packages/core_system/pytest_ten/pytest_ten/ten_test.py:602)

10. Deprecated API Usage (packages/core_system/pytest_ten/pytest_ten/ten_test.py:74, 122, 200, etc.)

🔒 Security

⚡ Performance

Considerations:

📝 Documentation & Conventions

Missing Documentation:

Code Conventions:

🧪 Test Coverage

Current Coverage:

Missing Coverage:

🏗️ Build System

BUILD.gn:

pyproject.toml:

uv.lock:

📊 Summary

🎯 Action Items

Before Merge:

Future Enhancements:

Uh oh!

claude bot commented Dec 8, 2025

Code Review: pytest_ten System Package

✅ Strengths

🐛 Potential Issues

1. asyncio.get_event_loop() Deprecation (Medium Priority)

2. Resource Cleanup in _TenTestRunner.on_deinit() (Low Priority)

3. Potential Race Condition in put_back Logic (Low Priority)

🔒 Security Considerations

⚡ Performance Considerations

1. Timeout Polling Overhead

2. Message Cloning

📝 Code Quality

1. Inconsistent README Placeholders

2. Type Hints Consistency

🧪 Test Coverage

Example Tests Look Good

📦 Build Configuration

BUILD.gn Issues

🎯 Recommendations Summary

High Priority

Medium Priority

Low Priority

✨ Overall Assessment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

Uh oh!

cursor bot Dec 8, 2025

Choose a reason for hiding this comment

Bug: Audio and video frames not cloned before queueing

Uh oh!

claude bot commented Dec 8, 2025

PR Review: pytest_ten System Package

Overall Assessment

feat: add a python system package `pytest_ten` supporting AAA-style tests #1838

feat: add a python system package `pytest_ten` supporting AAA-style tests #1838

anexpn commented Dec 7, 2025 •

edited by cursor bot

Loading

6. MINOR: `.vscode/settings.json` Change Concerns