feat(cli): standardize JSON output and sanitize error responses across toolkit packages#652
Conversation
…ion for error logs and audit events
…n to prevent information disclosure
…s across all toolkit packages
…ions in main.py patterns
|
Welcome to the Agent Governance Toolkit! Thanks for your first pull request. |
🤖 AI Agent: docs-sync-checker — Issues Found📝 Documentation Sync ReportIssues Found
Suggestions
Additional Notes
Action Items
Once these issues are addressed, the documentation will be in sync. |
🤖 AI Agent: security-scanner — Security Analysis of PR: `feat(cli): standardize JSON output and sanitize error responses across toolkit packages`Security Analysis of PR:
|
🤖 AI Agent: contributor-guide — Welcome! 🎉Welcome! 🎉Hi there, and welcome to the microsoft/agent-governance-toolkit project! We're thrilled to have you here and deeply appreciate your contribution. Thank you for taking the time to improve the project — your efforts make open source better for everyone! 🌟 PR Review 🧐First off, fantastic work on this pull request! You've clearly put a lot of thought into standardizing JSON output and improving error handling across multiple packages. Here's what I loved about your PR: What You Did Well ✅
Suggestions for Improvement ✍️While your PR is excellent, I noticed a few areas where we can make it even better:
Project Conventions 📜Here are a few key conventions we follow in this project:
For more details, check out our CONTRIBUTING.md and QUICKSTART.md. Next Steps 🚀Here's what happens next:
Thank you again for your contribution! If you have any questions or need help with anything, don't hesitate to ask. We're here to support you. 😊 |
🤖 AI Agent: breaking-change-detector — Summary🔍 API Compatibility ReportSummaryThe pull request introduces several changes across the CLI tools in the Findings
Migration GuideFor Breaking Changes:
For Potentially Breaking Changes:
ConclusionThe pull request introduces several breaking changes due to modifications in function signatures and stricter validation rules. These changes will require updates to existing codebases that use the affected CLI commands. Additionally, the introduction of sanitized error messages may impact workflows relying on specific error strings. However, the addition of the ✅ Action Required: Update all affected CLI calls and validate agent identifiers to ensure compatibility with the new changes. |
🤖 AI Agent: test-generator — `packages/agent-compliance/src/agent_compliance/cli/main.py`🧪 Test Coverage Analysis
|
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This PR introduces standardized JSON output and sanitized error handling across multiple CLI tools in the repository. The changes aim to improve security, usability, and consistency for machine-readable outputs, which is crucial for CI/CD pipelines and other automated systems. While the PR addresses several important issues, there are areas that need attention to ensure compliance with security best practices, maintain backward compatibility, and improve code quality.
🔴 CRITICAL
-
Insufficient Error Sanitization in JSON Output
- The error messages in the JSON responses are not fully sanitized. For example:
While the
print(json.dumps({"status": "fail", "error": "Governance verification failed", "type": "InternalError"}, indent=2))
errorfield is generic, thetypefield directly exposes the exception class name (e.g.,ValueError,KeyError). This could provide attackers with information about the internal implementation, which is a potential security risk (CWE-209: Information Exposure Through an Error Message). - Recommendation: Replace the
typefield with a generic error code or category (e.g.,ValidationError,SystemError) and avoid exposing exception class names directly.
- The error messages in the JSON responses are not fully sanitized. For example:
-
Regex Injection Vulnerability in Agent Identifier Validation
- The regex used to validate agent identifiers in the
auditcommand is insufficiently strict:While it limits the format, it does not prevent potential injection attacks (e.g.,if not re.fullmatch(r"^(?:agent-[a-zA-Z0-9-]+|did:agentmesh:[a-zA-Z0-9-]+)$", agent) or len(agent) > 128:
agent-123; rm -rf /). This could lead to command injection or other vulnerabilities if theagentparameter is used in subsequent commands or database queries. - Recommendation: Use a stricter validation mechanism, such as a library specifically designed for validating DIDs or UUIDs. If the identifier is used in database queries, ensure proper parameterized queries are used.
- The regex used to validate agent identifiers in the
-
Potential Sandbox Escape in
_init_claude_integration- The
_init_claude_integrationfunction does not validate or sanitize theconfig_pathparameter. If this path is user-controlled, it could lead to directory traversal or arbitrary file overwrite vulnerabilities. - Recommendation: Validate the
config_pathparameter to ensure it does not contain malicious input (e.g.,../../etc/passwd). Useos.path.abspathand ensure the path is within an allowed directory.
- The
🟡 WARNING
-
Potential Breaking Changes
- The introduction of the
--jsonflag changes the behavior of existing CLI commands. While this is a useful feature, it could break existing scripts or integrations that rely on the previous output format. - Recommendation: Clearly document these changes in the
CHANGELOG.mdand consider providing a deprecation warning for a release cycle before fully switching to the new behavior.
- The introduction of the
-
Error Handling Changes
- The replacement of bare
except:clauses with specific exception handling is a positive change. However, the addition of catch-allexcept Exception:blocks may inadvertently mask critical errors or introduce unexpected behavior. - Recommendation: Log the full stack trace for unexpected exceptions using
logger.exception()to aid debugging. Additionally, consider re-raising the exception after logging it to avoid masking issues.
- The replacement of bare
💡 SUGGESTIONS
-
Thread Safety in Concurrent Execution
- The PR does not address thread safety explicitly. Given that this is a governance toolkit for agents, concurrent execution might be a concern.
- Recommendation: Review the use of shared resources (e.g., file writes, in-memory data structures) to ensure thread safety. Consider using thread-safe data structures or locks where necessary.
-
Type Safety and Pydantic Validation
- While the PR introduces some input validation (e.g., regex for agent identifiers), it does not leverage Pydantic models for schema validation in JSON inputs/outputs.
- Recommendation: Use Pydantic models to define and validate the structure of JSON inputs and outputs. This will ensure type safety and reduce the risk of malformed data.
-
Logging Improvements
- The PR introduces some logging, but it could be more comprehensive. For example, logging is missing in several error-handling blocks.
- Recommendation: Use
logger.error()orlogger.exception()consistently in all error-handling blocks to provide better visibility into issues.
-
Test Coverage
- The PR does not include new tests for the added functionality (e.g.,
--jsonflag, error sanitization, regex validation). - Recommendation: Add unit tests to cover the new functionality, especially for edge cases (e.g., invalid JSON, malformed agent identifiers, unexpected exceptions).
- The PR does not include new tests for the added functionality (e.g.,
-
Documentation
- While the PR updates some documentation, it does not provide detailed examples of the new JSON output for all commands.
- Recommendation: Add detailed examples of the JSON output for each CLI command in the documentation. This will help users understand the new format and integrate it into their workflows.
-
Code Duplication
- There is some code duplication in the error-handling blocks for different commands (e.g., JSON error responses).
- Recommendation: Refactor the error-handling logic into a reusable utility function to reduce duplication and improve maintainability.
Final Assessment
- The PR addresses important security and usability concerns, but there are critical issues that need to be resolved before merging.
- Ensure that error messages are fully sanitized, input validation is robust, and potential sandbox escape vectors are mitigated.
- Address potential breaking changes by clearly documenting them and considering a deprecation period.
- Add comprehensive tests and improve logging for better maintainability and debugging.
Let me know if you need further clarification or assistance!
…revent information leakage
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Pull Request Review for feat(cli): standardize JSON output and sanitize error responses across toolkit packages
🔴 CRITICAL: Security Issues
-
Potential Information Disclosure in Error Messages
- Issue: While the PR introduces sanitized error messages for JSON output, there are still instances where raw exceptions (
e) are printed directly tostderrin non-JSON mode. For example:print(f"Error: {e}", file=sys.stderr)
- Risk: This could lead to sensitive information (e.g., stack traces, internal paths) being exposed in logs, violating CWE-209 (Information Exposure Through an Error Message).
- Recommendation: Ensure that all error messages, regardless of output format, are sanitized. Replace
f"Error: {e}"with a generic error message like"An unexpected error occurred. Please check logs for details."and log the actual exception details for debugging purposes.
- Issue: While the PR introduces sanitized error messages for JSON output, there are still instances where raw exceptions (
-
Insufficient Input Validation for Agent Identifiers
- Issue: While the PR introduces regex-based validation for agent identifiers in the
auditcommand, the regex used (r"^(?:agent-[a-zA-Z0-9-]+|did:agentmesh:[a-zA-Z0-9-]+)$") may not be strict enough to prevent all forms of injection attacks. - Risk: An attacker could potentially craft a malicious agent identifier that bypasses the regex and injects harmful payloads into the system.
- Recommendation: Use stricter validation rules for agent identifiers, such as explicitly defining the allowed character set and length. Additionally, consider using a dedicated library for validating DIDs if applicable.
- Issue: While the PR introduces regex-based validation for agent identifiers in the
-
Lack of Validation for JSON Output
- Issue: While the PR introduces JSON output for various commands, there is no validation to ensure that the JSON structure adheres to a predefined schema.
- Risk: Inconsistent or malformed JSON output could lead to issues in downstream systems that consume this output.
- Recommendation: Use a library like
pydanticto define and validate the JSON schema for all outputs. This ensures that the output is consistent and adheres to the expected format.
-
Unvalidated User Input in
auditCommand- Issue: The
auditcommand filters entries based on theagentparameter. While some validation is performed, the filtering logic (entries = [e for e in entries if e["agent"] == agent]) does not account for potential case sensitivity or encoding issues. - Risk: This could lead to unexpected behavior or bypasses in the filtering logic.
- Recommendation: Normalize and sanitize the
agentinput before performing the filtering. For example, convert both the input and theagentfield inentriesto lowercase before comparison.
- Issue: The
🟡 WARNING: Potential Breaking Changes
-
Behavior Change in CLI Commands
- Issue: The introduction of the
--jsonflag and changes to error handling may alter the behavior of existing CLI commands, potentially breaking backward compatibility for users relying on the previous output format or error messages. - Recommendation: Clearly document these changes in the release notes and consider providing a migration guide for users to adapt their workflows.
- Issue: The introduction of the
-
Removed Bare
except:Clauses- Issue: While replacing bare
except:clauses with specific exception handling is a good practice, it may inadvertently introduce breaking changes if certain edge cases are no longer caught. - Recommendation: Ensure comprehensive testing to verify that all expected exceptions are still handled correctly.
- Issue: While replacing bare
💡 Suggestions for Improvement
-
Use Centralized Error Handling
- Observation: Error handling logic is repeated across multiple commands (e.g.,
cmd_verify,cmd_integrity,cmd_lint_policy, etc.). - Suggestion: Implement a centralized error-handling utility function that takes the exception and output format (
--jsonor not) as arguments. This will reduce code duplication and ensure consistent error handling across all commands.
- Observation: Error handling logic is repeated across multiple commands (e.g.,
-
Thread Safety in Concurrent Execution
- Observation: The PR does not address thread safety explicitly. While the changes seem to focus on CLI improvements, it's important to ensure that any shared resources (e.g., log files, configuration files) are accessed in a thread-safe manner.
- Suggestion: Review the codebase for any shared resources and ensure proper synchronization mechanisms (e.g., threading locks) are in place.
-
Unit Tests for JSON Output
- Observation: The PR does not include new tests for the JSON output functionality.
- Suggestion: Add unit tests to validate the JSON output for all commands, ensuring that the output adheres to the expected schema and is properly sanitized.
-
Documentation Improvements
- Observation: While the PR updates some documentation, it does not provide detailed examples of the new JSON output for all commands.
- Suggestion: Update the documentation to include examples of the JSON output for each command, along with explanations of the fields in the JSON schema.
-
Use of
pydanticfor Validation- Observation: The PR introduces some manual validation logic (e.g., regex for agent identifiers, key whitelisting for audit entries).
- Suggestion: Leverage
pydanticmodels for input validation and output serialization. This will simplify the code and ensure consistency across the toolkit.
Summary
- Critical Issues: 4
- Warnings: 2
- Suggestions: 5
The PR introduces important security improvements, such as sanitizing error messages and standardizing JSON output. However, there are critical issues related to information disclosure, input validation, and JSON schema validation that need to be addressed. Additionally, the changes may introduce breaking changes for existing users, which should be clearly communicated. Finally, there are opportunities to improve code maintainability and robustness through centralized error handling, thread safety considerations, and the use of pydantic for validation.
…re sanitized, secure CLI output
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary
This PR introduces significant changes to the CLI tools across multiple packages in the microsoft/agent-governance-toolkit repository. The changes aim to standardize JSON output, sanitize error responses, and improve the security posture of the CLI tools. While the PR addresses several critical security concerns and improves usability, there are areas that require attention to ensure correctness, security, and maintainability.
🔴 CRITICAL
-
Potential Information Disclosure in Error Handling:
- While the
handle_errorfunction attempts to sanitize error messages, it still logs the full exception details (exc_info=True) in the logger. If the logs are accessible to unauthorized users, this could lead to information disclosure. - Recommendation: Ensure that sensitive information (e.g., file paths, stack traces, or internal system details) is not logged in production environments. Use environment-based logging levels to control the verbosity of logs.
- While the
-
Insufficient Validation for Agent Identifiers:
- The regex for validating agent identifiers in the
auditcommand is overly permissive. It allows potentially dangerous characters like:and-without sufficient context. This could lead to injection attacks or unexpected behavior. - Recommendation: Define a stricter regex pattern for agent identifiers, ensuring it aligns with the expected format (e.g., DIDs or alphanumeric names). Consider using a library like
pydanticfor validation.
- The regex for validating agent identifiers in the
-
Sandbox Escape Risk in
_init_claude_integration:- The
_init_claude_integrationfunction dynamically determines the configuration path based on the operating system. If theconfig_pathis user-supplied, it could lead to a sandbox escape or unauthorized file access. - Recommendation: Validate and sanitize the
config_pathinput to ensure it does not point to sensitive or unauthorized locations. Useos.path.realpathand restrict the path to a specific directory.
- The
🟡 WARNING
-
Breaking Changes in CLI Behavior:
- The introduction of the
--jsonflag and changes to error handling may break existing workflows that rely on the previous CLI output format. - Recommendation: Clearly document these changes in the release notes and provide migration guidance for users.
- The introduction of the
-
Backward Compatibility for CLI Commands:
- The
init,register,status, andpolicycommands now include the--jsonflag, which alters their output format. This could impact scripts or CI/CD pipelines that parse the CLI output. - Recommendation: Consider adding a deprecation warning for a release cycle before fully transitioning to the new behavior.
- The
💡 SUGGESTIONS
-
Test Coverage:
- The PR does not include new tests for the updated error handling and JSON output. This is critical to ensure the correctness and security of the changes.
- Recommendation: Add unit tests for the
handle_errorfunction and CLI commands to verify:- Correct JSON output for both success and error cases.
- Proper sanitization of error messages.
- Validation of agent identifiers and other inputs.
-
Centralized Logging Configuration:
- The
handle_errorfunction logs errors using thelogger.errormethod. However, there is no indication of a centralized logging configuration in the PR. - Recommendation: Configure a centralized logging system with appropriate log levels and handlers (e.g., file, console, or remote logging). Ensure sensitive information is excluded from logs in production environments.
- The
-
Use of Pydantic for Validation:
- The CLI commands manually validate inputs (e.g., agent identifiers, file paths). This approach is error-prone and inconsistent.
- Recommendation: Use
pydanticmodels for input validation and schema enforcement. This will improve type safety and reduce the risk of validation errors.
-
Error Type Consistency:
- The
handle_errorfunction uses generic error types (ValidationError,InternalError) in the JSON output. This may not provide sufficient context for debugging or automated error handling. - Recommendation: Define a set of specific error types (e.g.,
FileNotFoundError,PermissionError) and include them in the JSON output.
- The
-
Documentation Updates:
- While the PR includes updates to the documentation, it does not provide detailed examples of the new JSON output for all CLI commands.
- Recommendation: Update the documentation to include comprehensive examples of the JSON output for each CLI command, including both success and error cases.
-
Thread Safety:
- The PR does not address thread safety concerns, particularly in the context of concurrent CLI command execution.
- Recommendation: Review the CLI commands for potential race conditions or shared state issues. Use thread-safe constructs (e.g., locks) where necessary.
Additional Notes
- The PR introduces significant improvements to the security and usability of the CLI tools. However, the changes also increase the complexity of the codebase, particularly in error handling and input validation. It is essential to ensure that these changes are thoroughly tested and documented.
- The use of
clickfor CLI commands is a good choice, but consider leveraging its advanced features (e.g., custom parameter types, context objects) to simplify the code and improve maintainability.
Action Items
- Address the critical issues related to information disclosure, input validation, and sandbox escape risks.
- Add comprehensive tests for the new functionality, focusing on edge cases and security.
- Update the documentation to reflect the changes in CLI behavior and provide migration guidance.
- Consider the suggestions for improving validation, logging, and thread safety.
Let me know if you need further clarification or assistance!
There was a problem hiding this comment.
🤖 AI Agent: code-reviewer
Review Summary:
This PR introduces standardized JSON output and sanitized error handling across multiple CLI tools in the repository. The changes improve security by preventing internal system details from being exposed in error messages and enhance usability by providing consistent machine-readable output for CI/CD pipelines. However, there are critical areas that require attention to ensure correctness, security, and backward compatibility.
🔴 CRITICAL Issues:
-
Error Sanitization Incompleteness:
- Issue: In
handle_error, the sanitization logic relies on theis_knownflag to determine whether to display specific error messages. However, this approach is prone to inconsistencies, as some exceptions (e.g.,TypeError,AttributeError) are not explicitly handled and may leak sensitive details. - Impact: This could lead to partial exposure of internal stack traces or sensitive information in unexpected scenarios.
- Recommendation: Expand the list of explicitly handled exceptions and ensure that all unhandled exceptions are treated as "unknown" with a generic error message. Consider using a whitelist approach for error types instead of a blacklist.
- Issue: In
-
Regex Validation for Agent Identifiers:
- Issue: The regex used for validating agent identifiers in the
auditcommand (re.fullmatch) is overly permissive and does not account for potential injection attacks (e.g.,did:agentmesh:../../etc/passwd). - Impact: This could allow malicious input to bypass validation and potentially exploit downstream components.
- Recommendation: Tighten the regex to explicitly disallow directory traversal patterns and other potentially harmful characters. Use a strict schema for validation, such as a Pydantic model.
- Issue: The regex used for validating agent identifiers in the
-
Thread Safety in Logging:
- Issue: The
handle_errorfunction logs errors usinglogger.errorwithout ensuring thread safety. If multiple threads invoke this function simultaneously, log messages may interleave or be lost. - Impact: This could complicate debugging and compromise the integrity of log files in multi-threaded environments.
- Recommendation: Use thread-safe logging handlers (e.g.,
QueueHandler) or implement a locking mechanism to serialize log writes.
- Issue: The
-
Audit Log Sanitization:
- Issue: The audit log sanitization logic in the
auditcommand relies on a key whitelist but does not validate the values of these keys. For example, a malicious entry could include a large or malformed string in theactionfield. - Impact: This could lead to denial-of-service attacks or unexpected behavior in downstream consumers of the JSON output.
- Recommendation: Validate the values of all keys in the audit log entries (e.g., enforce length limits, disallow special characters).
- Issue: The audit log sanitization logic in the
🟡 WARNING: Potential Breaking Changes
-
Behavior Change in CLI Error Handling:
- Issue: The new error handling mechanism changes the structure of error messages in JSON mode, which may break existing integrations that rely on the previous format.
- Impact: Downstream systems consuming CLI output may fail to parse or interpret the new error format.
- Recommendation: Clearly document these changes in the release notes and provide a migration guide for users.
-
CLI Output Changes:
- Issue: The addition of the
--jsonflag and changes to the default output format (e.g., sanitized error messages) may affect scripts or tools that parse CLI output. - Impact: Users relying on the previous output format may experience failures or unexpected behavior.
- Recommendation: Consider introducing a deprecation period where both the old and new formats are supported, with warnings about the upcoming changes.
- Issue: The addition of the
💡 Suggestions for Improvement:
-
Unit Tests for Error Handling:
- Add comprehensive unit tests for the
handle_errorfunction to ensure that all error types are correctly sanitized and logged. Include edge cases such as nested exceptions and custom error types.
- Add comprehensive unit tests for the
-
Pydantic Models for Validation:
- Use Pydantic models to validate CLI inputs (e.g., agent identifiers, policy files) and ensure type safety. This will reduce the risk of injection attacks and improve code maintainability.
-
Consistent Logging Levels:
- Ensure that all log messages use consistent logging levels (e.g.,
INFO,ERROR) based on their severity. This will make it easier to filter logs during debugging.
- Ensure that all log messages use consistent logging levels (e.g.,
-
Backward Compatibility Layer:
- Introduce a compatibility layer for CLI commands that allows users to opt into the new JSON output format while retaining the old format as the default. This will ease the transition for existing users.
-
Documentation Enhancements:
- Update the documentation to include examples of the new JSON output format for all CLI commands. Highlight the security improvements and provide guidance on interpreting sanitized error messages.
Final Assessment:
The PR addresses critical security and usability issues but introduces potential risks related to error sanitization, input validation, and backward compatibility. Addressing the identified issues and recommendations will ensure a robust and secure implementation.
Description
This PR standardizes JSON CLI output formatting and sanitizes error responses across all toolkit packages (
agent-mesh,agent-os,iatp, andmcp-kernel-server). It ensures that machine-readable output is consistent for CI/CD integration while preventing the leakage of internal exception details.Critical fixes include:
init-integrationand_init_claude_integrationlogic inagent-meshthat facilitates Claude Desktop setup.except:clauses inmain.pyentry points with specificImportErrorand handling to avoid blocking system signals.messagefield of JSON responses.--jsonflag.Type of Change
Package(s) Affected
Checklist
Related Issues
References #525
Fixes #526