feat(code_review): Set Sentry tags & context#108435
Conversation
…lation Set minimal Sentry SDK tags in the code review Celery task using the same naming conventions as Seer (scm_provider, scm_owner, pr_id, etc.) so errors can be searched consistently across both projects. Co-authored-by: Cursor <[email protected]>
…tions Merge _set_sentry_tags and log_seer_request into a single _set_tags_and_log function to eliminate duplicate payload parsing. Filter out None values before calling sentry_sdk.set_tags() to prevent "None" string pollution in tag searches (e.g. from check_run minimal payloads). Co-authored-by: Cursor <[email protected]>
…er them Co-authored-by: Cursor <[email protected]>
Rename extract_github_info output keys from github_* to scm_* to match Seer's extract_context() tag names, so errors are searchable with the same queries across both projects. - github_owner → scm_owner - github_repo_name → scm_repo_name - github_repo_full_name → scm_repo_full_name - github_event_url → scm_event_url - Add scm_provider = "github" (always set) _set_tags_and_log now uses extract_github_info and constructs scm_event_url from the task payload trigger type, mirroring Seer's extract_context() logic (ON_NEW_COMMIT links to the commit, ON_COMMAND_PHRASE links to the comment, default is the PR URL). Co-authored-by: Cursor <[email protected]>
extract_github_info now calls sentry_sdk.set_tags() as a side effect so any caller (handlers.py, the Celery task) automatically tags the scope with the scm_* fields without extra boilerplate. _set_tags in task.py is simplified to only handle the task-payload- specific extras (pr_id, sentry_organization_id, sentry_integration_id) and the trigger-based scm_event_url override. Co-authored-by: Cursor <[email protected]>
…request Co-authored-by: Cursor <[email protected]>
…extract_github_info - Add organization_id, organization_slug, integration_id params to extract_github_info so callers can set all Seer-consistent scope tags from one place - Pass these from handlers.py (which already has organization/integration) so the full set of scm_* + sentry_* tags is set at webhook entry - Remove extra dict threading through handlers now that all context is on the Sentry scope automatically - Fix regression: restore github_to_seer_latency metric that was lost when log_seer_request was refactored; emits as seer.code_review.task.github_to_seer_latency before each Seer call Co-authored-by: Cursor <[email protected]>
Co-authored-by: Cursor <[email protected]>
Co-authored-by: Cursor <[email protected]>
…om get_tags Co-authored-by: Cursor <[email protected]>
…with tags parameter - Added a `tags` parameter to `handle_check_run_event` and `handle_pull_request_event` to allow passing Sentry SDK tags directly. - Updated the logic in `handle_check_run_event` to merge incoming tags with check_run-specific overrides. - Refactored tests in `TestExtractGithubInfo` to include organization and integration IDs when calling `get_tags`, ensuring consistent tagging across events.
| status = "success" | ||
| should_record_latency = True | ||
| try: | ||
| _set_tags(event_payload, github_event) |
There was a problem hiding this comment.
In Seer, we decided to set tags & context with Sentry rather than passing extra to the log lines.
| organization: Organization, | ||
| repo: Repository, | ||
| integration: RpcIntegration | None = None, | ||
| integration: RpcIntegration, |
There was a problem hiding this comment.
This is an unrelated typing change I can pull out.
You can see that we always pass integration:
sentry/src/sentry/integrations/github/webhook.py
Lines 268 to 275 in f3c764d
and that it's not None earlier in that function:
sentry/src/sentry/integrations/github/webhook.py
Lines 206 to 220 in f3c764d
| github_event: GithubWebhookType, | ||
| event: Mapping[str, Any], | ||
| extra: Mapping[str, str | None], | ||
| tags: Mapping[str, Any], |
There was a problem hiding this comment.
The main handler is going to set the tags and context, however, we want to pass the tags to the scheduled task so we can set it there as well (the info is lost when we schedule a task).
|
|
||
| if action is None: | ||
| logger.error(Log.MISSING_ACTION.value, extra=extra) | ||
| logger.error(Log.MISSING_ACTION.value) |
There was a problem hiding this comment.
Since we now set the context and tags we don't need to pass it to the logs just so we can filter out logs.
| action=validated_event.action, | ||
| html_url=validated_event.check_run.html_url, | ||
| enqueued_at_str=datetime.now(timezone.utc).isoformat(), | ||
| tags=task_tags, |
There was a problem hiding this comment.
This just schedules the task with the tags so it can be set inside the task.
| enqueued_at_str: str, | ||
| github_event: str, | ||
| event_payload: Mapping[str, Any], | ||
| tags: Mapping[str, Any] | None = None, |
There was a problem hiding this comment.
Temporarily optional since for a brief moment we will have task scheduled without tags.
| payload = event_payload | ||
|
|
||
| log_seer_request(event_payload, github_event) | ||
| record_github_to_seer_latency(event_payload) |
There was a problem hiding this comment.
Minor unrelated refactor. We're going to do the logging within the make_seer_request instead of here.
| - github_event: The GitHub event type (e.g., "pull_request", "check_run", "issue_comment") | ||
| - github_event_action: The event action (e.g., "opened", "closed", "created") | ||
| - pr_id: The pull request number (when available in the event) | ||
| - scm_event_url: URL to the specific event (check_run, pull_request, comment, or commit) |
There was a problem hiding this comment.
In Seer we use the scm_ prefix instead of github_ since one day we will support other integrations.
| Args: | ||
| event: The GitHub webhook event payload | ||
| github_event: The GitHub event type (e.g., "pull_request", "check_run", "issue_comment") | ||
| organization_id: Sentry organization ID |
There was a problem hiding this comment.
We're going to track a few more keys which we track in the Seer tags:
https://github.com/getsentry/seer/blob/6e29b970e5424efbdb273661ba5387a75c736a0c/src/seer/automation/codegen/tasks.py#L128-L243
src/sentry/seer/code_review/utils.py
Outdated
| if github_event == "issue_comment": | ||
| result["trigger"] = "on_command_phrase" | ||
| elif github_event == "pull_request": | ||
| if github_event_action in ("opened", "ready_for_review"): | ||
| result["trigger"] = "on_ready_for_review" | ||
| elif github_event_action == "synchronize": | ||
| result["trigger"] = "on_new_commit" |
There was a problem hiding this comment.
Consider using
class SeerCodeReviewTrigger(StrEnum):
UNKNOWN = "unknown"
ON_COMMAND_PHRASE = "on_command_phrase"
ON_READY_FOR_REVIEW = "on_ready_for_review"
ON_NEW_COMMIT = "on_new_commit"
src/sentry/seer/code_review/utils.py
Outdated
| elif github_event_action == "synchronize": | ||
| result["trigger"] = "on_new_commit" | ||
| elif github_event_action == "closed": | ||
| result["trigger"] = "pr-closed" |
There was a problem hiding this comment.
underscore vs dash - should we keep consistent?
also, all the other have "on_" prefix (which I don't like but again, for the sake of consistency...)
src/sentry/seer/code_review/utils.py
Outdated
| # Extract pr_id from the event. | ||
| pr_id = event.get("pull_request", {}).get("number") or event.get("issue", {}).get("number") | ||
| if pr_id: | ||
| result["pr_id"] = str(pr_id) |
There was a problem hiding this comment.
any chance we could use pr_number instead of pr_id? The latter evokes a database ID. But IDK what we currently do in seer at the moment.
There was a problem hiding this comment.
I will not track it here since it would require a bunch of changes in Seer as well:
https://github.com/search?q=repo%3Agetsentry%2Fseer%20pr_id&type=code
…ion documentation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
## Summary - Add `sentry_sdk.set_tags()` in the code review Celery task (`process_github_webhook_event`) to enrich errors with correlation metadata - Uses the **same tag names as Seer's `extract_context()`** so errors can be searched consistently across both Sentry and Seer projects (e.g., `scm_repo_full_name:getsentry/sentry pr_id:42`) - Tags set: `scm_provider`, `scm_owner`, `scm_repo_name`, `scm_repo_full_name`, `pr_id`, `sentry_organization_id`, `sentry_integration_id`, `github_event` - Gracefully handles minimal payloads (e.g., check_run rerun events that lack repo data) Previously, the Celery task had zero `sentry_sdk` usage, so exceptions captured by `instrumented_task` had no code-review-specific context for debugging. ## Test plan - [x] Added `TestSetSentryTags` with 3 tests: PR event payload, check_run minimal payload, missing owner/name - [x] All 32 existing webhook tests still pass Made with [Cursor](https://cursor.com) --------- Co-authored-by: Cursor <[email protected]>
## Summary - Add `sentry_sdk.set_tags()` in the code review Celery task (`process_github_webhook_event`) to enrich errors with correlation metadata - Uses the **same tag names as Seer's `extract_context()`** so errors can be searched consistently across both Sentry and Seer projects (e.g., `scm_repo_full_name:getsentry/sentry pr_id:42`) - Tags set: `scm_provider`, `scm_owner`, `scm_repo_name`, `scm_repo_full_name`, `pr_id`, `sentry_organization_id`, `sentry_integration_id`, `github_event` - Gracefully handles minimal payloads (e.g., check_run rerun events that lack repo data) Previously, the Celery task had zero `sentry_sdk` usage, so exceptions captured by `instrumented_task` had no code-review-specific context for debugging. ## Test plan - [x] Added `TestSetSentryTags` with 3 tests: PR event payload, check_run minimal payload, missing owner/name - [x] All 32 existing webhook tests still pass Made with [Cursor](https://cursor.com) --------- Co-authored-by: Cursor <[email protected]>
Summary
sentry_sdk.set_tags()in the code review Celery task (process_github_webhook_event) to enrich errors with correlation metadataextract_context()so errors can be searched consistently across both Sentry and Seer projects (e.g.,scm_repo_full_name:getsentry/sentry pr_id:42)scm_provider,scm_owner,scm_repo_name,scm_repo_full_name,pr_id,sentry_organization_id,sentry_integration_id,github_eventPreviously, the Celery task had zero
sentry_sdkusage, so exceptions captured byinstrumented_taskhad no code-review-specific context for debugging.Test plan
TestSetSentryTagswith 3 tests: PR event payload, check_run minimal payload, missing owner/nameMade with Cursor