Skip to content

feat(cli): standardize JSON output and sanitize error responses across toolkit packages#652

Merged
imran-siddique merged 11 commits intomicrosoft:mainfrom
Deadpool2000:fix-issue-526
Mar 31, 2026
Merged

feat(cli): standardize JSON output and sanitize error responses across toolkit packages#652
imran-siddique merged 11 commits intomicrosoft:mainfrom
Deadpool2000:fix-issue-526

Conversation

@Deadpool2000
Copy link
Copy Markdown
Contributor

Description

This PR standardizes JSON CLI output formatting and sanitizes error responses across all toolkit packages (agent-mesh, agent-os, iatp, and mcp-kernel-server). It ensures that machine-readable output is consistent for CI/CD integration while preventing the leakage of internal exception details.

Critical fixes include:

  • Restored Functionality: Re-implemented the init-integration and _init_claude_integration logic in agent-mesh that facilitates Claude Desktop setup.
  • Improved Reliability: Replaced bare except: clauses in main.py entry points with specific ImportError and handling to avoid blocking system signals.
  • Security: Sanitize error strings to ensure internal system details (like stack traces) are not exposed in the message field of JSON responses.
  • Documentation: Updated quickstarts, cheatsheets, and CI/CD examples to reflect the new universal support for the --json flag.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Maintenance (dependency updates, CI/CD, refactoring)
  • Security fix

Package(s) Affected

  • agent-os-kernel
  • agent-mesh
  • agent-runtime
  • agent-sre
  • agent-governance
  • docs / root

Checklist

  • My code follows the project style guidelines (ruff check)
  • I have added tests that prove my fix/feature works
  • All new and existing tests pass (pytest)
  • I have updated documentation as needed
  • I have signed the Microsoft CLA

Related Issues

References #525
Fixes #526

@github-actions github-actions Bot added documentation Improvements or additions to documentation agent-mesh agent-mesh package labels Mar 31, 2026
@github-actions
Copy link
Copy Markdown

Welcome to the Agent Governance Toolkit! Thanks for your first pull request.
Please ensure tests pass, code follows style (ruff check), and you have signed the CLA.
See our Contributing Guide.

@github-actions github-actions Bot added the size/XS Extra small PR (< 10 lines) label Mar 31, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

🤖 AI Agent: docs-sync-checker — Issues Found

📝 Documentation Sync Report

Issues Found

  1. handle_error(e: Exception, output_json: bool = False, custom_msg: Optional[str] = None) in packages/agent-compliance/src/agent_compliance/cli/main.py — missing docstring.
  2. handle_error(e: Exception, output_json: bool = False, custom_msg: Optional[str] = None) in packages/agent-mesh/src/agentmesh/cli/main.py — missing docstring.
  3. ⚠️ packages/agent-mesh/README.md — No updates were made to reflect the new --json flag support for all commands.
  4. ⚠️ CHANGELOG.md — Missing detailed entries for the following:
    • The addition of the --json flag across CLI commands in agent-mesh and agent-compliance.
    • The new centralized error handling mechanism (handle_error function).
  5. ⚠️ examples/ — No updates were made to example code to demonstrate the new --json flag or sanitized error handling.

Suggestions

  • 💡 Add a docstring for handle_error(e: Exception, output_json: bool = False, custom_msg: Optional[str] = None) in both agent_compliance/cli/main.py and agentmesh/cli/main.py. Include details about parameters, return values, and exceptions.
  • 💡 Update packages/agent-mesh/README.md to include a section about the --json flag and its usage for machine-readable output.
  • 💡 Add detailed entries to CHANGELOG.md for the --json flag and centralized error handling.
  • 💡 Update example code in examples/ to showcase the --json flag and sanitized error handling.

Additional Notes

  • The changes to the QUICKSTART.md and docs/tutorials/04-audit-and-compliance.md files are appropriate and reflect the new behavior.
  • The PRD-IMPLEMENTATION.md file in packages/agent-mesh/docs/ has been updated to mention the --json flag, but the main README.md for agent-mesh is still missing this information.
  • The new public APIs (handle_error) are missing type hints for the custom_msg parameter, which should be annotated as Optional[str].

Action Items

  1. Add docstrings for the handle_error function in both agent_compliance and agentmesh CLI modules.
  2. Update the packages/agent-mesh/README.md to document the --json flag.
  3. Add detailed entries in CHANGELOG.md for the new features and changes.
  4. Update example code in examples/ to reflect the new --json flag and error handling.
  5. Ensure all new public APIs have complete type annotations.

Once these issues are addressed, the documentation will be in sync.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

🤖 AI Agent: security-scanner — Security Analysis of PR: `feat(cli): standardize JSON output and sanitize error responses across toolkit packages`

Security Analysis of PR: feat(cli): standardize JSON output and sanitize error responses across toolkit packages


1. Prompt Injection Defense Bypass

Rating: 🔵 LOW
Analysis:
The PR introduces regex-based validation for agent identifiers in the audit command. The regex appears to be sufficiently strict to prevent injection attacks via agent identifiers. However, it is worth noting that the regex does not explicitly validate against all possible injection vectors (e.g., Unicode homoglyphs or encoded characters). While the current implementation is robust for the expected input format, it is recommended to add additional sanitization or normalization steps to handle edge cases.

Recommendation:

  • Consider normalizing input strings to a canonical form (e.g., using unicodedata.normalize) before applying regex validation.
  • Add unit tests for edge cases, such as inputs with Unicode homoglyphs or encoded characters, to ensure the regex handles them appropriately.

2. Policy Engine Circumvention

Rating: 🔵 LOW
Analysis:
The PR includes updates to the policy command to load and validate policies. The PolicyEngine class is used to load policies, and the Policy objects are instantiated with strict schema validation. There is no indication of a mechanism to bypass policy loading or validation.

Recommendation:

  • Ensure that the PolicyEngine and Policy classes enforce schema validation at runtime.
  • Add tests to verify that malformed or malicious policy files are correctly rejected.

3. Trust Chain Weaknesses

Rating: 🟠 HIGH
Analysis:
The register command uses the AgentIdentity.create method to generate an identity and register it with the AgentMesh CA. However, there is no evidence of certificate pinning, SPIFFE/SVID validation, or other mechanisms to ensure the integrity of the trust chain. This could allow an attacker to impersonate the CA or inject malicious certificates.

Recommendation:

  • Implement certificate pinning or SPIFFE/SVID validation to ensure the authenticity of the CA.
  • Add tests to verify that only trusted certificates are accepted during the registration process.

4. Credential Exposure

Rating: 🔴 CRITICAL
Analysis:
The handle_error function logs error details using logger.error(f"CLI Error: {e}", exc_info=True). While this is useful for debugging, it may inadvertently log sensitive information (e.g., file paths, stack traces, or credentials) if the logger is configured to output to a file or console in production environments.

Recommendation:

  • Ensure that sensitive information is redacted from error logs.
  • Use a secure logging configuration in production to prevent unauthorized access to log files.
  • Consider using a structured logging library that supports automatic redaction of sensitive fields.

5. Sandbox Escape

Rating: 🔵 LOW
Analysis:
The PR does not introduce any new functionality that could lead to a sandbox escape. The CLI commands primarily involve file I/O and JSON/YAML parsing, which are unlikely to result in a sandbox breakout.

Recommendation:

  • No immediate action required.

6. Deserialization Attacks

Rating: 🟠 HIGH
Analysis:
The PR uses yaml.safe_load and json.loads for deserialization, which are generally safer than their non-safe counterparts. However, the yaml.safe_load function can still be vulnerable to certain attacks if the input is not properly sanitized.

Recommendation:

  • Consider using a library like ruamel.yaml for stricter YAML parsing.
  • Validate the structure and content of deserialized data against a schema before processing it further.

7. Race Conditions

Rating: 🔵 LOW
Analysis:
The PR does not introduce any new functionality that could lead to race conditions. The changes are primarily focused on CLI error handling and JSON output formatting.

Recommendation:

  • No immediate action required.

8. Supply Chain

Rating: 🟠 HIGH
Analysis:
The PR does not include any changes to dependencies. However, the use of third-party libraries like yaml and click introduces potential supply chain risks.

Recommendation:

  • Use tools like pip-audit or safety to scan for vulnerabilities in dependencies.
  • Consider pinning dependency versions to known-good versions to prevent supply chain attacks.

Summary of Findings

  1. Prompt Injection Defense Bypass: 🔵 LOW
  2. Policy Engine Circumvention: 🔵 LOW
  3. Trust Chain Weaknesses: 🟠 HIGH
  4. Credential Exposure: 🔴 CRITICAL
  5. Sandbox Escape: 🔵 LOW
  6. Deserialization Attacks: 🟠 HIGH
  7. Race Conditions: 🔵 LOW
  8. Supply Chain: 🟠 HIGH

Overall Risk Assessment: HIGH

The PR introduces significant improvements in error handling and JSON output standardization. However, there are critical issues related to credential exposure and trust chain weaknesses that must be addressed before merging. Additionally, deserialization and supply chain risks should be mitigated to ensure the security of the toolkit.

Actionable Recommendations

  1. Credential Exposure:

    • Redact sensitive information from error logs.
    • Use a secure logging configuration in production.
  2. Trust Chain Weaknesses:

    • Implement certificate pinning or SPIFFE/SVID validation.
    • Add tests to verify the authenticity of certificates.
  3. Deserialization Attacks:

    • Use stricter YAML parsing libraries like ruamel.yaml.
    • Validate deserialized data against a schema.
  4. Supply Chain:

    • Scan dependencies for vulnerabilities using tools like pip-audit.
    • Pin dependency versions to known-good versions.
  5. Prompt Injection Defense:

    • Add normalization and additional tests for edge cases in agent identifier validation.

Addressing these issues will significantly improve the security posture of the toolkit.

@github-actions
Copy link
Copy Markdown

🤖 AI Agent: contributor-guide — Welcome! 🎉

Welcome! 🎉

Hi there, and welcome to the microsoft/agent-governance-toolkit project! We're thrilled to have you here and deeply appreciate your contribution. Thank you for taking the time to improve the project — your efforts make open source better for everyone! 🌟


PR Review 🧐

First off, fantastic work on this pull request! You've clearly put a lot of thought into standardizing JSON output and improving error handling across multiple packages. Here's what I loved about your PR:

What You Did Well ✅

  1. Clear Description: Your PR description is thorough and well-structured, making it easy to understand the scope of your changes. The breakdown of critical fixes is especially helpful.
  2. Security Awareness: I appreciate the focus on sanitizing error responses and preventing sensitive information leakage. This is a crucial improvement for any project, especially one dealing with governance and compliance.
  3. Documentation Updates: You didn't just stop at code changes — you also updated the documentation (e.g., QUICKSTART.md, 04-audit-and-compliance.md) to reflect the new functionality. This is a great example of a holistic contribution.
  4. Code Clarity: The use of try-except blocks for error handling and the inclusion of JSON output options demonstrate a thoughtful approach to improving user experience.

Suggestions for Improvement ✍️

While your PR is excellent, I noticed a few areas where we can make it even better:

  1. Tests:

    • It seems like you haven't added tests for the new functionality (e.g., JSON output, error sanitization). Since this project requires tests for new features, could you add test cases under the appropriate tests/ directories (e.g., packages/{name}/tests/) to ensure the new behavior is covered? This will help us maintain the quality and reliability of the toolkit.
    • For example, you could add tests to verify that the JSON output adheres to the expected schema and that error messages are properly sanitized.
  2. Linting:

    • It looks like you haven't run ruff on your changes yet. This project uses ruff for linting (with rules E, F, and W enabled). Could you run ruff check . and address any issues it flags? You can find more details in our CONTRIBUTING.md.
  3. Commit Messages:

    • While your PR title is great, the individual commit messages don't seem to follow the Conventional Commits standard. For example, commit messages should follow the format feat(cli): add JSON output support or fix(error-handling): sanitize error messages. This helps us maintain a clean and consistent commit history.
  4. Code Comments:

    • While your code is generally clear, adding comments to explain the purpose of some of the more complex changes (e.g., the regex validation for agent identifiers) would make it easier for future contributors to understand your work.

Project Conventions 📜

Here are a few key conventions we follow in this project:

  1. Linting: We use ruff for linting with rules E, F, and W. Please run ruff check . and fix any issues.
  2. Testing: All new functionality should be accompanied by tests. Place your tests in the packages/{name}/tests/ directory.
  3. Commit Messages: Follow the Conventional Commits standard (e.g., feat:, fix:, docs:, etc.).
  4. Security: Since this project deals with governance and compliance, any security-sensitive code (e.g., error handling, input validation) will receive extra scrutiny. You've already done a great job addressing this in your PR!

For more details, check out our CONTRIBUTING.md and QUICKSTART.md.


Next Steps 🚀

Here's what happens next:

  1. Address Feedback: Please review the suggestions above and make any necessary changes. Feel free to ask questions if anything is unclear!
  2. Push Updates: Once you've made the changes, push them to your branch. This will automatically update the pull request.
  3. Further Review: After you've addressed the feedback, a maintainer will review your PR again and provide additional feedback if needed.
  4. Merge: Once everything looks good, your PR will be merged, and your contribution will become part of the project! 🎉

Thank you again for your contribution! If you have any questions or need help with anything, don't hesitate to ask. We're here to support you. 😊

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

🤖 AI Agent: breaking-change-detector — Summary

🔍 API Compatibility Report

Summary

The pull request introduces several changes across the CLI tools in the microsoft/agent-governance-toolkit repository. While most changes are additive and aim to improve functionality and security, there are a few potentially breaking changes due to modifications in function signatures and behavior. Below is a detailed analysis of the API compatibility impact.

Findings

Severity Package Change Impact
🔴 agent-mesh register() added a new required parameter output_json Existing callers without the new parameter will fail.
🔴 agent-mesh status() added a new required parameter output_json Existing callers without the new parameter will fail.
🔴 agent-mesh policy() added a new required parameter output_json Existing callers without the new parameter will fail.
🔴 agent-mesh audit() added a new required parameter output_json Existing callers without the new parameter will fail.
🟡 agent-mesh audit() now validates agent identifiers with stricter regex rules May cause errors for previously valid but non-standard agent IDs.
🟡 agent-mesh CLI error handling now sanitizes error messages May impact workflows relying on specific error messages.
🟡 agent-compliance CLI error handling now sanitizes error messages May impact workflows relying on specific error messages.
🔵 agent-mesh Added --json flag to CLI commands for machine-readable output New feature, no breaking impact.
🔵 agent-compliance Added handle_error() function for centralized error handling New internal utility function, no breaking impact.

Migration Guide

For Breaking Changes:

  1. register()

    • Update existing calls to include the new output_json parameter. Example:
      register(agent_dir="/path/to/agent", name="agent-name", output_json=False)
  2. status()

    • Update existing calls to include the new output_json parameter. Example:
      status(agent_dir="/path/to/agent", output_json=False)
  3. policy()

    • Update existing calls to include the new output_json parameter. Example:
      policy(policy_file="/path/to/policy.yaml", validate=True, output_json=False)
  4. audit()

    • Update existing calls to include the new output_json parameter. Example:
      audit(agent="agent-id", limit=20, fmt="table", output_json=False)

For Potentially Breaking Changes:

  1. Stricter Agent Identifier Validation

    • Ensure all agent identifiers conform to the new regex rules. Update any identifiers that do not match the pattern: ^(?:agent-[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*|did:agentmesh:[a-zA-Z0-9]+(?:[-:][a-zA-Z0-9]+)*)$.
  2. Sanitized Error Messages

    • Update workflows or scripts that rely on specific error messages to handle the new sanitized error format.

Conclusion

The pull request introduces several breaking changes due to modifications in function signatures and stricter validation rules. These changes will require updates to existing codebases that use the affected CLI commands. Additionally, the introduction of sanitized error messages may impact workflows relying on specific error strings. However, the addition of the --json flag and centralized error handling are additive changes and do not break existing functionality.

Action Required: Update all affected CLI calls and validate agent identifiers to ensure compatibility with the new changes.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

🤖 AI Agent: test-generator — `packages/agent-compliance/src/agent_compliance/cli/main.py`

🧪 Test Coverage Analysis

packages/agent-compliance/src/agent_compliance/cli/main.py

  • Existing coverage:

    • Basic functionality of CLI commands like cmd_verify, cmd_integrity, and cmd_lint_policy.
    • Some error handling for known exceptions (e.g., FileNotFoundError, ValueError).
  • Missing coverage:

    • The new handle_error function is not explicitly tested.
    • Scenarios where --json flag is used for error handling and output formatting.
    • Edge cases for error handling, such as unknown exceptions or invalid inputs.
    • Validation of JSON output structure and content.
  • 💡 Suggested test cases:

    1. test_cmd_verify_json_output — Verify that the --json flag produces the correct JSON output for successful and failed verifications.
    2. test_handle_error_known_exceptions — Test handle_error with known exceptions (e.g., FileNotFoundError, ValueError) and validate the JSON output.
    3. test_handle_error_unknown_exception — Test handle_error with an unknown exception and validate that the error message is sanitized in the JSON output.
    4. test_cmd_integrity_json_error_handling — Simulate an error during the cmd_integrity execution with --json and validate the sanitized error response.
    5. test_cmd_lint_policy_json_error_handling — Simulate an error during the cmd_lint_policy execution with --json and validate the sanitized error response.

packages/agent-mesh/src/agentmesh/cli/main.py

  • Existing coverage:

    • Basic functionality of CLI commands like init, register, status, policy, and audit.
    • Some error handling for known exceptions (e.g., FileNotFoundError, ImportError).
  • Missing coverage:

    • The new handle_error function is not explicitly tested.
    • Scenarios where --json flag is used for error handling and output formatting.
    • Validation of JSON output structure and content for commands like init, register, status, policy, and audit.
    • Edge cases for input validation, such as malformed agent identifiers in audit and register commands.
    • Edge cases for policy validation, such as policies with conflicting rules or invalid formats.
    • Edge cases for trust score calculation, such as edge scores (e.g., 0/1000, 1000/1000) and invalid trust score data.
  • 💡 Suggested test cases:

    1. test_init_json_output — Verify that the init command with --json produces the correct JSON output, including all expected fields.
    2. test_register_invalid_agent_identifier — Test the register command with invalid agent identifiers and validate the error handling and JSON output.
    3. test_status_json_output — Verify that the status command with --json produces the correct JSON output, including trust score and agent details.
    4. test_policy_conflicting_rules — Test the policy command with a policy file containing conflicting rules and validate the error handling.
    5. test_audit_invalid_agent_identifier — Test the audit command with an invalid agent identifier and validate the error handling and JSON output.
    6. test_audit_json_output — Verify that the audit command with --json produces sanitized and correctly formatted JSON output.

packages/agent-os/src/agent_os/cli/__init__.py

  • Existing coverage:

    • Basic module initialization and imports.
  • Missing coverage:

    • No specific functionality to test in this file.
  • 💡 Suggested test cases:

    • No additional test cases needed for this file.

packages/agent-os/src/agent_os/cli/mcp_scan.py

  • Existing coverage:

    • Basic functionality of mcp_scan command.
  • Missing coverage:

    • Edge cases for input validation, such as malformed or oversized payloads.
    • Scenarios where the mcp_scan command encounters partial failures or timeouts.
    • Validation of JSON output structure and content when --json flag is used.
  • 💡 Suggested test cases:

    1. test_mcp_scan_malformed_input — Test the mcp_scan command with malformed input and validate the error handling.
    2. test_mcp_scan_oversized_payload — Test the mcp_scan command with an oversized payload and validate the error handling.
    3. test_mcp_scan_partial_failure — Simulate a partial failure during the mcp_scan execution and validate the error handling.
    4. test_mcp_scan_timeout — Simulate a timeout during the mcp_scan execution and validate the error handling.
    5. test_mcp_scan_json_output — Verify that the mcp_scan command with --json produces the correct JSON output.

packages/agent-os/src/agent_os/policies/cli.py

  • Existing coverage:

    • Basic functionality of policy-related CLI commands.
  • Missing coverage:

    • Edge cases for policy evaluation, such as boundary conditions, conflicting policies, and policy bypass attempts.
    • Validation of JSON output structure and content when --json flag is used.
    • Scenarios where policy evaluation encounters errors or unexpected inputs.
  • 💡 Suggested test cases:

    1. test_policy_evaluation_boundary_conditions — Test policy evaluation with boundary conditions (e.g., minimum and maximum values).
    2. test_policy_evaluation_conflicting_policies — Test policy evaluation with conflicting policies and validate the error handling.
    3. test_policy_evaluation_bypass_attempt — Test policy evaluation with inputs designed to bypass policies and validate the results.
    4. test_policy_json_output — Verify that policy-related commands with --json produce the correct JSON output.
    5. test_policy_evaluation_error_handling — Simulate an error during policy evaluation and validate the error handling and JSON output.

Summary

The changes in this PR introduce significant new functionality and error handling improvements, particularly around JSON output and input validation. While some existing tests may cover basic functionality, additional test cases are needed to ensure comprehensive coverage of the new features and edge cases.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This PR introduces standardized JSON output and sanitized error handling across multiple CLI tools in the repository. The changes aim to improve security, usability, and consistency for machine-readable outputs, which is crucial for CI/CD pipelines and other automated systems. While the PR addresses several important issues, there are areas that need attention to ensure compliance with security best practices, maintain backward compatibility, and improve code quality.


🔴 CRITICAL

  1. Insufficient Error Sanitization in JSON Output

    • The error messages in the JSON responses are not fully sanitized. For example:
      print(json.dumps({"status": "fail", "error": "Governance verification failed", "type": "InternalError"}, indent=2))
      While the error field is generic, the type field directly exposes the exception class name (e.g., ValueError, KeyError). This could provide attackers with information about the internal implementation, which is a potential security risk (CWE-209: Information Exposure Through an Error Message).
    • Recommendation: Replace the type field with a generic error code or category (e.g., ValidationError, SystemError) and avoid exposing exception class names directly.
  2. Regex Injection Vulnerability in Agent Identifier Validation

    • The regex used to validate agent identifiers in the audit command is insufficiently strict:
      if not re.fullmatch(r"^(?:agent-[a-zA-Z0-9-]+|did:agentmesh:[a-zA-Z0-9-]+)$", agent) or len(agent) > 128:
      While it limits the format, it does not prevent potential injection attacks (e.g., agent-123; rm -rf /). This could lead to command injection or other vulnerabilities if the agent parameter is used in subsequent commands or database queries.
    • Recommendation: Use a stricter validation mechanism, such as a library specifically designed for validating DIDs or UUIDs. If the identifier is used in database queries, ensure proper parameterized queries are used.
  3. Potential Sandbox Escape in _init_claude_integration

    • The _init_claude_integration function does not validate or sanitize the config_path parameter. If this path is user-controlled, it could lead to directory traversal or arbitrary file overwrite vulnerabilities.
    • Recommendation: Validate the config_path parameter to ensure it does not contain malicious input (e.g., ../../etc/passwd). Use os.path.abspath and ensure the path is within an allowed directory.

🟡 WARNING

  1. Potential Breaking Changes

    • The introduction of the --json flag changes the behavior of existing CLI commands. While this is a useful feature, it could break existing scripts or integrations that rely on the previous output format.
    • Recommendation: Clearly document these changes in the CHANGELOG.md and consider providing a deprecation warning for a release cycle before fully switching to the new behavior.
  2. Error Handling Changes

    • The replacement of bare except: clauses with specific exception handling is a positive change. However, the addition of catch-all except Exception: blocks may inadvertently mask critical errors or introduce unexpected behavior.
    • Recommendation: Log the full stack trace for unexpected exceptions using logger.exception() to aid debugging. Additionally, consider re-raising the exception after logging it to avoid masking issues.

💡 SUGGESTIONS

  1. Thread Safety in Concurrent Execution

    • The PR does not address thread safety explicitly. Given that this is a governance toolkit for agents, concurrent execution might be a concern.
    • Recommendation: Review the use of shared resources (e.g., file writes, in-memory data structures) to ensure thread safety. Consider using thread-safe data structures or locks where necessary.
  2. Type Safety and Pydantic Validation

    • While the PR introduces some input validation (e.g., regex for agent identifiers), it does not leverage Pydantic models for schema validation in JSON inputs/outputs.
    • Recommendation: Use Pydantic models to define and validate the structure of JSON inputs and outputs. This will ensure type safety and reduce the risk of malformed data.
  3. Logging Improvements

    • The PR introduces some logging, but it could be more comprehensive. For example, logging is missing in several error-handling blocks.
    • Recommendation: Use logger.error() or logger.exception() consistently in all error-handling blocks to provide better visibility into issues.
  4. Test Coverage

    • The PR does not include new tests for the added functionality (e.g., --json flag, error sanitization, regex validation).
    • Recommendation: Add unit tests to cover the new functionality, especially for edge cases (e.g., invalid JSON, malformed agent identifiers, unexpected exceptions).
  5. Documentation

    • While the PR updates some documentation, it does not provide detailed examples of the new JSON output for all commands.
    • Recommendation: Add detailed examples of the JSON output for each CLI command in the documentation. This will help users understand the new format and integrate it into their workflows.
  6. Code Duplication

    • There is some code duplication in the error-handling blocks for different commands (e.g., JSON error responses).
    • Recommendation: Refactor the error-handling logic into a reusable utility function to reduce duplication and improve maintainability.

Final Assessment

  • The PR addresses important security and usability concerns, but there are critical issues that need to be resolved before merging.
  • Ensure that error messages are fully sanitized, input validation is robust, and potential sandbox escape vectors are mitigated.
  • Address potential breaking changes by clearly documenting them and considering a deprecation period.
  • Add comprehensive tests and improve logging for better maintainability and debugging.

Let me know if you need further clarification or assistance!

@github-actions github-actions Bot added the size/XL Extra large PR (500+ lines) label Mar 31, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Pull Request Review for feat(cli): standardize JSON output and sanitize error responses across toolkit packages


🔴 CRITICAL: Security Issues

  1. Potential Information Disclosure in Error Messages

    • Issue: While the PR introduces sanitized error messages for JSON output, there are still instances where raw exceptions (e) are printed directly to stderr in non-JSON mode. For example:
      print(f"Error: {e}", file=sys.stderr)
    • Risk: This could lead to sensitive information (e.g., stack traces, internal paths) being exposed in logs, violating CWE-209 (Information Exposure Through an Error Message).
    • Recommendation: Ensure that all error messages, regardless of output format, are sanitized. Replace f"Error: {e}" with a generic error message like "An unexpected error occurred. Please check logs for details." and log the actual exception details for debugging purposes.
  2. Insufficient Input Validation for Agent Identifiers

    • Issue: While the PR introduces regex-based validation for agent identifiers in the audit command, the regex used (r"^(?:agent-[a-zA-Z0-9-]+|did:agentmesh:[a-zA-Z0-9-]+)$") may not be strict enough to prevent all forms of injection attacks.
    • Risk: An attacker could potentially craft a malicious agent identifier that bypasses the regex and injects harmful payloads into the system.
    • Recommendation: Use stricter validation rules for agent identifiers, such as explicitly defining the allowed character set and length. Additionally, consider using a dedicated library for validating DIDs if applicable.
  3. Lack of Validation for JSON Output

    • Issue: While the PR introduces JSON output for various commands, there is no validation to ensure that the JSON structure adheres to a predefined schema.
    • Risk: Inconsistent or malformed JSON output could lead to issues in downstream systems that consume this output.
    • Recommendation: Use a library like pydantic to define and validate the JSON schema for all outputs. This ensures that the output is consistent and adheres to the expected format.
  4. Unvalidated User Input in audit Command

    • Issue: The audit command filters entries based on the agent parameter. While some validation is performed, the filtering logic (entries = [e for e in entries if e["agent"] == agent]) does not account for potential case sensitivity or encoding issues.
    • Risk: This could lead to unexpected behavior or bypasses in the filtering logic.
    • Recommendation: Normalize and sanitize the agent input before performing the filtering. For example, convert both the input and the agent field in entries to lowercase before comparison.

🟡 WARNING: Potential Breaking Changes

  1. Behavior Change in CLI Commands

    • Issue: The introduction of the --json flag and changes to error handling may alter the behavior of existing CLI commands, potentially breaking backward compatibility for users relying on the previous output format or error messages.
    • Recommendation: Clearly document these changes in the release notes and consider providing a migration guide for users to adapt their workflows.
  2. Removed Bare except: Clauses

    • Issue: While replacing bare except: clauses with specific exception handling is a good practice, it may inadvertently introduce breaking changes if certain edge cases are no longer caught.
    • Recommendation: Ensure comprehensive testing to verify that all expected exceptions are still handled correctly.

💡 Suggestions for Improvement

  1. Use Centralized Error Handling

    • Observation: Error handling logic is repeated across multiple commands (e.g., cmd_verify, cmd_integrity, cmd_lint_policy, etc.).
    • Suggestion: Implement a centralized error-handling utility function that takes the exception and output format (--json or not) as arguments. This will reduce code duplication and ensure consistent error handling across all commands.
  2. Thread Safety in Concurrent Execution

    • Observation: The PR does not address thread safety explicitly. While the changes seem to focus on CLI improvements, it's important to ensure that any shared resources (e.g., log files, configuration files) are accessed in a thread-safe manner.
    • Suggestion: Review the codebase for any shared resources and ensure proper synchronization mechanisms (e.g., threading locks) are in place.
  3. Unit Tests for JSON Output

    • Observation: The PR does not include new tests for the JSON output functionality.
    • Suggestion: Add unit tests to validate the JSON output for all commands, ensuring that the output adheres to the expected schema and is properly sanitized.
  4. Documentation Improvements

    • Observation: While the PR updates some documentation, it does not provide detailed examples of the new JSON output for all commands.
    • Suggestion: Update the documentation to include examples of the JSON output for each command, along with explanations of the fields in the JSON schema.
  5. Use of pydantic for Validation

    • Observation: The PR introduces some manual validation logic (e.g., regex for agent identifiers, key whitelisting for audit entries).
    • Suggestion: Leverage pydantic models for input validation and output serialization. This will simplify the code and ensure consistency across the toolkit.

Summary

  • Critical Issues: 4
  • Warnings: 2
  • Suggestions: 5

The PR introduces important security improvements, such as sanitizing error messages and standardizing JSON output. However, there are critical issues related to information disclosure, input validation, and JSON schema validation that need to be addressed. Additionally, the changes may introduce breaking changes for existing users, which should be clearly communicated. Finally, there are opportunities to improve code maintainability and robustness through centralized error handling, thread safety considerations, and the use of pydantic for validation.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary

This PR introduces significant changes to the CLI tools across multiple packages in the microsoft/agent-governance-toolkit repository. The changes aim to standardize JSON output, sanitize error responses, and improve the security posture of the CLI tools. While the PR addresses several critical security concerns and improves usability, there are areas that require attention to ensure correctness, security, and maintainability.


🔴 CRITICAL

  1. Potential Information Disclosure in Error Handling:

    • While the handle_error function attempts to sanitize error messages, it still logs the full exception details (exc_info=True) in the logger. If the logs are accessible to unauthorized users, this could lead to information disclosure.
    • Recommendation: Ensure that sensitive information (e.g., file paths, stack traces, or internal system details) is not logged in production environments. Use environment-based logging levels to control the verbosity of logs.
  2. Insufficient Validation for Agent Identifiers:

    • The regex for validating agent identifiers in the audit command is overly permissive. It allows potentially dangerous characters like : and - without sufficient context. This could lead to injection attacks or unexpected behavior.
    • Recommendation: Define a stricter regex pattern for agent identifiers, ensuring it aligns with the expected format (e.g., DIDs or alphanumeric names). Consider using a library like pydantic for validation.
  3. Sandbox Escape Risk in _init_claude_integration:

    • The _init_claude_integration function dynamically determines the configuration path based on the operating system. If the config_path is user-supplied, it could lead to a sandbox escape or unauthorized file access.
    • Recommendation: Validate and sanitize the config_path input to ensure it does not point to sensitive or unauthorized locations. Use os.path.realpath and restrict the path to a specific directory.

🟡 WARNING

  1. Breaking Changes in CLI Behavior:

    • The introduction of the --json flag and changes to error handling may break existing workflows that rely on the previous CLI output format.
    • Recommendation: Clearly document these changes in the release notes and provide migration guidance for users.
  2. Backward Compatibility for CLI Commands:

    • The init, register, status, and policy commands now include the --json flag, which alters their output format. This could impact scripts or CI/CD pipelines that parse the CLI output.
    • Recommendation: Consider adding a deprecation warning for a release cycle before fully transitioning to the new behavior.

💡 SUGGESTIONS

  1. Test Coverage:

    • The PR does not include new tests for the updated error handling and JSON output. This is critical to ensure the correctness and security of the changes.
    • Recommendation: Add unit tests for the handle_error function and CLI commands to verify:
      • Correct JSON output for both success and error cases.
      • Proper sanitization of error messages.
      • Validation of agent identifiers and other inputs.
  2. Centralized Logging Configuration:

    • The handle_error function logs errors using the logger.error method. However, there is no indication of a centralized logging configuration in the PR.
    • Recommendation: Configure a centralized logging system with appropriate log levels and handlers (e.g., file, console, or remote logging). Ensure sensitive information is excluded from logs in production environments.
  3. Use of Pydantic for Validation:

    • The CLI commands manually validate inputs (e.g., agent identifiers, file paths). This approach is error-prone and inconsistent.
    • Recommendation: Use pydantic models for input validation and schema enforcement. This will improve type safety and reduce the risk of validation errors.
  4. Error Type Consistency:

    • The handle_error function uses generic error types (ValidationError, InternalError) in the JSON output. This may not provide sufficient context for debugging or automated error handling.
    • Recommendation: Define a set of specific error types (e.g., FileNotFoundError, PermissionError) and include them in the JSON output.
  5. Documentation Updates:

    • While the PR includes updates to the documentation, it does not provide detailed examples of the new JSON output for all CLI commands.
    • Recommendation: Update the documentation to include comprehensive examples of the JSON output for each CLI command, including both success and error cases.
  6. Thread Safety:

    • The PR does not address thread safety concerns, particularly in the context of concurrent CLI command execution.
    • Recommendation: Review the CLI commands for potential race conditions or shared state issues. Use thread-safe constructs (e.g., locks) where necessary.

Additional Notes

  • The PR introduces significant improvements to the security and usability of the CLI tools. However, the changes also increase the complexity of the codebase, particularly in error handling and input validation. It is essential to ensure that these changes are thoroughly tested and documented.
  • The use of click for CLI commands is a good choice, but consider leveraging its advanced features (e.g., custom parameter types, context objects) to simplify the code and improve maintainability.

Action Items

  1. Address the critical issues related to information disclosure, input validation, and sandbox escape risks.
  2. Add comprehensive tests for the new functionality, focusing on edge cases and security.
  3. Update the documentation to reflect the changes in CLI behavior and provide migration guidance.
  4. Consider the suggestions for improving validation, logging, and thread safety.

Let me know if you need further clarification or assistance!

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Agent: code-reviewer

Review Summary:

This PR introduces standardized JSON output and sanitized error handling across multiple CLI tools in the repository. The changes improve security by preventing internal system details from being exposed in error messages and enhance usability by providing consistent machine-readable output for CI/CD pipelines. However, there are critical areas that require attention to ensure correctness, security, and backward compatibility.


🔴 CRITICAL Issues:

  1. Error Sanitization Incompleteness:

    • Issue: In handle_error, the sanitization logic relies on the is_known flag to determine whether to display specific error messages. However, this approach is prone to inconsistencies, as some exceptions (e.g., TypeError, AttributeError) are not explicitly handled and may leak sensitive details.
    • Impact: This could lead to partial exposure of internal stack traces or sensitive information in unexpected scenarios.
    • Recommendation: Expand the list of explicitly handled exceptions and ensure that all unhandled exceptions are treated as "unknown" with a generic error message. Consider using a whitelist approach for error types instead of a blacklist.
  2. Regex Validation for Agent Identifiers:

    • Issue: The regex used for validating agent identifiers in the audit command (re.fullmatch) is overly permissive and does not account for potential injection attacks (e.g., did:agentmesh:../../etc/passwd).
    • Impact: This could allow malicious input to bypass validation and potentially exploit downstream components.
    • Recommendation: Tighten the regex to explicitly disallow directory traversal patterns and other potentially harmful characters. Use a strict schema for validation, such as a Pydantic model.
  3. Thread Safety in Logging:

    • Issue: The handle_error function logs errors using logger.error without ensuring thread safety. If multiple threads invoke this function simultaneously, log messages may interleave or be lost.
    • Impact: This could complicate debugging and compromise the integrity of log files in multi-threaded environments.
    • Recommendation: Use thread-safe logging handlers (e.g., QueueHandler) or implement a locking mechanism to serialize log writes.
  4. Audit Log Sanitization:

    • Issue: The audit log sanitization logic in the audit command relies on a key whitelist but does not validate the values of these keys. For example, a malicious entry could include a large or malformed string in the action field.
    • Impact: This could lead to denial-of-service attacks or unexpected behavior in downstream consumers of the JSON output.
    • Recommendation: Validate the values of all keys in the audit log entries (e.g., enforce length limits, disallow special characters).

🟡 WARNING: Potential Breaking Changes

  1. Behavior Change in CLI Error Handling:

    • Issue: The new error handling mechanism changes the structure of error messages in JSON mode, which may break existing integrations that rely on the previous format.
    • Impact: Downstream systems consuming CLI output may fail to parse or interpret the new error format.
    • Recommendation: Clearly document these changes in the release notes and provide a migration guide for users.
  2. CLI Output Changes:

    • Issue: The addition of the --json flag and changes to the default output format (e.g., sanitized error messages) may affect scripts or tools that parse CLI output.
    • Impact: Users relying on the previous output format may experience failures or unexpected behavior.
    • Recommendation: Consider introducing a deprecation period where both the old and new formats are supported, with warnings about the upcoming changes.

💡 Suggestions for Improvement:

  1. Unit Tests for Error Handling:

    • Add comprehensive unit tests for the handle_error function to ensure that all error types are correctly sanitized and logged. Include edge cases such as nested exceptions and custom error types.
  2. Pydantic Models for Validation:

    • Use Pydantic models to validate CLI inputs (e.g., agent identifiers, policy files) and ensure type safety. This will reduce the risk of injection attacks and improve code maintainability.
  3. Consistent Logging Levels:

    • Ensure that all log messages use consistent logging levels (e.g., INFO, ERROR) based on their severity. This will make it easier to filter logs during debugging.
  4. Backward Compatibility Layer:

    • Introduce a compatibility layer for CLI commands that allows users to opt into the new JSON output format while retaining the old format as the default. This will ease the transition for existing users.
  5. Documentation Enhancements:

    • Update the documentation to include examples of the new JSON output format for all CLI commands. Highlight the security improvements and provide guidance on interpreting sanitized error messages.

Final Assessment:

The PR addresses critical security and usability issues but introduces potential risks related to error sanitization, input validation, and backward compatibility. Addressing the identified issues and recommendations will ensure a robust and secure implementation.

@imran-siddique imran-siddique merged commit 6c7ad7d into microsoft:main Mar 31, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-mesh agent-mesh package documentation Improvements or additions to documentation size/XL Extra large PR (500+ lines) size/XS Extra small PR (< 10 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add dark mode support to Chrome extension popup

2 participants