Skip to content

Commit d2c697f

Browse files
authored
Add sonic-error-report tool for structured error reporting (sonic-net#4037)
* Add sonic-error-report tool for structured error reporting - Add error_reporter.py library module to utilities_common/ - Add sonic-error-report CLI script to scripts/ - Add comprehensive unit tests to tests/ - Update setup.py to include new script in installation This provides crash-resilient JSON error reporting for SONiC operations, preparing for future integration with reboot and upgrade scripts. * Fix flake8 errors in sonic_error_report_test.py - Remove unused imports (sys, unittest, pytest, MagicMock, mock_open, call) - Fix line length violations by breaking long lines - Fix test logic for mark_failure_missing_report to properly handle SystemExit - Update test assertions to handle expected sanitization behavior * Fix whitespace issues in error_reporter.py - Remove trailing whitespace from blank lines (W293) - Ensure file ends with newline (W292) - Fix some long lines for readability These changes address the pre-commit flake8 failures. * Fix test_scenario_sanitization to match actual behavior The scenario sanitization replaces path separators with underscores but allows dots, so '../../../etc/malicious' becomes '.._.._.._etc_malicious'. Update test to verify the actual sanitization behavior rather than incorrect expectations. * Fix test_scenario_sanitization to test filename only The test was incorrectly checking the full path which contains forward slashes from the temporary directory. Now it properly extracts just the filename to test sanitization behavior. * Update error reporting to match sonic-metadata improvements Key updates to align sonic-utilities error reporter with sonic-metadata version: 1. Conservative defaults (safer for production): - reputation_impact: false (was true) - isolate_on_failure: false (was true) - retriable: remains true 2. SONiC logger integration with fallback support: - Added sonic_py_common.logger import with graceful fallback - Replaced sys.stderr.write with structured logging - Compatible with environments that lack sonic_py_common 3. Enhanced error handling: - Logger initialization in CLI and library - Proper logging for all operations and errors 4. Python 2/3 compatibility maintained: - Uses .format() syntax throughout - No f-string dependencies This brings the sonic-utilities error reporter up to parity with the sonic-metadata version while maintaining backward compatibility. * Fix flake8 whitespace issues Remove whitespace from blank lines to pass pre-commit checks: - utilities_common/error_reporter.py:43,46,51,56 * Remove trailing whitespace on blank line * Fix flake8 line length violations * Fix tests to use logger mocks instead of stderr and update default expectations * just to rerun pipeline * Replace specific scenario names with generic placeholders in help text * Refactor error reporter to use JSON template file Address PR review feedback by moving report structure to external JSON template. - Add error_report_template.json with default report structure - Add error_report_schema.json for documentation and validation - Update SonicErrorReportManager to load template from JSON file - Add fallback behavior if template file is missing - Maintain crash resilience with no external dependencies at runtime This provides standardization while keeping the tool reliable and self-contained. Template can be modified without code changes, and schema serves as documentation. * Add template generation script for JSON schema workflow - Add generate_template_from_schema.py to create template from schema - Include package_data configuration for JSON template files - Provides development workflow to keep template in sync with schema - Template generation ensures consistency with schema defaults * Add comprehensive unit tests for template generation script - Test all JSON schema type handling (string, boolean, integer, array, object) - Test explicit default values from schema properties - Test special handling for errors array with default timeout error - Test file operations for schema reading and template writing - Test main function success and error paths - Achieve full test coverage for generate_template_from_schema.py * Fix flake8 whitespace issues in template generation files - Remove trailing whitespace from blank lines (W293) - Add newline at end of files (W292) - Remove unused import in test file (F401) - Move imports to top of test file (E402) - Ready for CI pipeline static analysis validation * Fix pre-commit flake8 whitespace violations - Remove trailing whitespace from error_reporter.py (W291) - Remove whitespace from blank lines in both files (W293) - Add newline at end of test file (W292) - All pre-commit whitespace checks now pass * Revert "Add template generation script for JSON schema workflow" This reverts commit e31101b28cc225c476ecb7fcaaccdd08e15a75ee. * Fix flake8 whitespace violations in error_reporter.py - Remove trailing whitespace from line 73 (W291) - Remove whitespace from blank lines 176 and 179 (W293) - All pre-commit checks now pass
1 parent ed5afd8 commit d2c697f

File tree

6 files changed

+1014
-1
lines changed

6 files changed

+1014
-1
lines changed

scripts/sonic-error-report

Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
#!/usr/bin/env python3
2+
"""
3+
SONiC error report management CLI tool.
4+
"""
5+
import sys
6+
import json
7+
import argparse
8+
import os
9+
from utilities_common.error_reporter import SonicErrorReportManager, get_logger
10+
11+
12+
def main():
13+
# Initialize SONiC logger
14+
get_logger() # Ensure logger is initialized
15+
16+
parser = argparse.ArgumentParser(
17+
description="SONiC error report management tool",
18+
formatter_class=argparse.RawDescriptionHelpFormatter,
19+
epilog="""
20+
Examples:
21+
sonic-error-report --scenario <scenario-name> -e uuid-123 init <operation-type>
22+
sonic-error-report --scenario <scenario-name> -e uuid-456 init <operation-type>
23+
sonic-error-report --scenario <scenario-name> -e uuid-123 fail 1 --reason "Operation failed"
24+
sonic-error-report --scenario <scenario-name> -e uuid-789 success --duration 60
25+
26+
File naming:
27+
Creates: <scenario>.<eventGuid>.json
28+
Examples: <scenario-name>.uuid-123.json, <scenario-name>.uuid-456.json
29+
"""
30+
)
31+
32+
parser.add_argument(
33+
'--report-dir',
34+
default="/host/sonic-upgrade-reports",
35+
help="Directory to store report files (default: /host/sonic-upgrade-reports)"
36+
)
37+
38+
parser.add_argument(
39+
'--scenario',
40+
required=True,
41+
help="Scenario name for file naming (creates <scenario>.<eventGuid>.json format)"
42+
)
43+
44+
parser.add_argument(
45+
'-e', '--event-guid',
46+
required=True,
47+
help="Event GUID for report naming (required)"
48+
)
49+
50+
subparsers = parser.add_subparsers(dest='command', help='Available commands')
51+
52+
# Init command
53+
init_parser = subparsers.add_parser('init', help='Initialize staged report')
54+
init_parser.add_argument('operation_type', help='Type of operation (fast-reboot, upgrade, etc.)')
55+
init_parser.add_argument('guid', nargs='?', help='Optional GUID (auto-generated if not provided)')
56+
57+
# Add kwargs support for init command
58+
init_parser.add_argument('--package-version', help='SONiC upgrade package version')
59+
init_parser.add_argument('--reputation-impact', type=bool, help='Reputation impact (true/false)')
60+
init_parser.add_argument('--retriable', type=bool, help='Is retriable (true/false)')
61+
init_parser.add_argument('--isolate-on-failure', type=bool, help='Isolate on failure (true/false)')
62+
init_parser.add_argument('--triage-status', type=bool, help='Auto triage status (true/false)')
63+
init_parser.add_argument('--triage-queue', help='Triage queue name')
64+
init_parser.add_argument('--triage-action', help='Triage action/workflow name')
65+
init_parser.add_argument('--duration', help='Operation duration in seconds')
66+
67+
# Fail command
68+
fail_parser = subparsers.add_parser('fail', help='Mark report as failed')
69+
fail_parser.add_argument('exit_code', type=int, help='Exit code from failed operation')
70+
fail_parser.add_argument('-r', '--reason', help='Optional fault reason description')
71+
72+
# Add identical kwargs support for fail command
73+
fail_parser.add_argument('--package-version', help='SONiC upgrade package version')
74+
fail_parser.add_argument('--reputation-impact', type=bool, help='Reputation impact (true/false)')
75+
fail_parser.add_argument('--retriable', type=bool, help='Is retriable (true/false)')
76+
fail_parser.add_argument('--isolate-on-failure', type=bool, help='Isolate on failure (true/false)')
77+
fail_parser.add_argument('--triage-status', type=bool, help='Auto triage status (true/false)')
78+
fail_parser.add_argument('--triage-queue', help='Triage queue name')
79+
fail_parser.add_argument('--triage-action', help='Triage action/workflow name')
80+
fail_parser.add_argument('--duration', help='Operation duration in seconds')
81+
82+
# Success command
83+
success_parser = subparsers.add_parser('success', help='Mark report as successful')
84+
85+
# Add kwargs support for success command (matching init/fail commands)
86+
success_parser.add_argument('--package-version', help='SONiC upgrade package version')
87+
success_parser.add_argument('--duration', help='Operation duration in seconds')
88+
success_parser.add_argument('--triage-status', type=bool, help='Auto triage status (true/false)')
89+
success_parser.add_argument('--triage-queue', help='Triage queue name')
90+
success_parser.add_argument('--triage-action', help='Triage action/workflow name')
91+
92+
# Don't catch parse_args() - let argparse handle argument errors gracefully
93+
args = parser.parse_args()
94+
95+
if not args.command:
96+
parser.print_help()
97+
sys.exit(1)
98+
99+
# Handle system/environmental errors during manager setup
100+
try:
101+
manager = SonicErrorReportManager(args.report_dir, args.scenario)
102+
except (OSError, IOError) as e:
103+
get_logger().log_error("Cannot access report directory '{}': {}".format(args.report_dir, e))
104+
# Try to generate a minimal framework error report if possible
105+
try:
106+
if args.event_guid and os.path.exists(os.path.dirname(args.report_dir)):
107+
# Create minimal error report directly
108+
report_path = os.path.join(args.report_dir, "{}.{}.json".format(args.scenario, args.event_guid))
109+
minimal_report = {
110+
"sonic_upgrade_summary": {
111+
"script_name": args.scenario,
112+
"fault_code": "254",
113+
"fault_reason": "Framework initialization failed: Cannot access report directory: {}".format(str(e)),
114+
"guid": args.event_guid
115+
},
116+
"sonic_upgrade_actions": {"reputation_impact": False, "retriable": True, "isolate_on_failure": False, "auto_triage": {"status": False, "triage_queue": "", "triage_action": ""}},
117+
"sonic_upgrade_report": {"duration": "0", "stages": [], "health_checks": [], "errors": [{"name": "FRAMEWORK_INIT_ERROR", "message": "Cannot access report directory"}]}
118+
}
119+
with open(report_path, 'w') as f:
120+
json.dump(minimal_report, f, indent=2)
121+
sys.stderr.write("Generated minimal error report for GUID: {}\n".format(args.event_guid))
122+
except Exception:
123+
pass
124+
sys.exit(2)
125+
except Exception as e:
126+
get_logger().log_error("Failed to initialize report manager: {}".format(e))
127+
sys.exit(2)
128+
129+
# Helper function to extract kwargs from args
130+
def extract_kwargs(args):
131+
kwargs = {}
132+
# Map CLI args to kwargs (replace dashes with underscores)
133+
if hasattr(args, 'package_version') and args.package_version is not None:
134+
kwargs['package_version'] = args.package_version
135+
if hasattr(args, 'reputation_impact') and args.reputation_impact is not None:
136+
kwargs['reputation_impact'] = args.reputation_impact
137+
if hasattr(args, 'retriable') and args.retriable is not None:
138+
kwargs['retriable'] = args.retriable
139+
if hasattr(args, 'isolate_on_failure') and args.isolate_on_failure is not None:
140+
kwargs['isolate_on_failure'] = args.isolate_on_failure
141+
if hasattr(args, 'triage_status') and args.triage_status is not None:
142+
kwargs['triage_status'] = args.triage_status
143+
if hasattr(args, 'triage_queue') and args.triage_queue is not None:
144+
kwargs['triage_queue'] = args.triage_queue
145+
if hasattr(args, 'triage_action') and args.triage_action is not None:
146+
kwargs['triage_action'] = args.triage_action
147+
if hasattr(args, 'duration') and args.duration is not None:
148+
kwargs['duration'] = args.duration
149+
return kwargs
150+
151+
# Handle business logic errors during command execution
152+
try:
153+
if args.command == 'init':
154+
kwargs = extract_kwargs(args)
155+
manager.init_report(args.operation_type, args.event_guid, **kwargs)
156+
elif args.command == 'fail':
157+
kwargs = extract_kwargs(args)
158+
manager.mark_failure(args.event_guid, args.exit_code, args.reason, **kwargs)
159+
elif args.command == 'success':
160+
kwargs = extract_kwargs(args)
161+
manager.mark_success(args.event_guid, **kwargs)
162+
except Exception as e:
163+
get_logger().log_error("Operation failed: {}".format(e))
164+
165+
# If we have enough context, try to generate a framework error report
166+
try:
167+
if args.event_guid and args.scenario:
168+
# Create a generic framework error report to ensure JSON always exists
169+
framework_manager = SonicErrorReportManager(args.report_dir, args.scenario)
170+
framework_manager.mark_failure(
171+
args.event_guid,
172+
255, # Generic framework error code
173+
"Error reporting framework failure: {}".format(str(e))
174+
)
175+
sys.stderr.write("Generated framework error report for GUID: {}\n".format(args.event_guid))
176+
except Exception:
177+
# If we can't even create the error report, just pass
178+
pass
179+
180+
sys.exit(1)
181+
182+
183+
if __name__ == "__main__":
184+
main()

setup.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,8 @@
194194
'scripts/verify_image_sign_common.sh',
195195
'scripts/check_db_integrity.py',
196196
'scripts/sysreadyshow',
197-
'scripts/wredstat'
197+
'scripts/wredstat',
198+
'scripts/sonic-error-report'
198199
],
199200
entry_points={
200201
'console_scripts': [

0 commit comments

Comments
 (0)