Emit OTEL error events when any downstream service it depends on returns a rate-limiting or quota-related failure

## Summary
  Vigilante should emit OTEL error events when any downstream service it depends on returns a rate-limiting or quota-related failure. The codebase already classifies some provider quota/rate-limit failures and already has
  OTEL workflow/event plumbing, but there is not yet a clear, dedicated telemetry signal for service-side rate limiting across the systems Vigilante uses.

  Issue Type: feature

  ## Problem
  - Vigilante depends on multiple external services and CLIs, including GitHub and coding-agent providers.
  - When one of those services returns a rate-limit or quota-related failure, Vigilante can block or fail locally, but the rate-limit event itself is not being surfaced as a dedicated OTEL error signal.
  - This makes it harder to observe systemic throttling problems, distinguish service-side quota exhaustion from other failures, and build alerting or dashboards around rate-limit pressure.

  ## Context
  - Repository: `aliengiraffe/vigilante`
  - Existing OTEL and analytics support lives in `internal/telemetry/telemetry.go`.
  - Existing workflow telemetry already captures bounded product/workflow events via `telemetry.CaptureWorkflowEvent(...)`.
  - The codebase already recognizes quota/rate-limit style failures in some places, for example provider-side classification that maps usage-limit/rate-limit/quota failures into `provider_quota`.
  - Similar behavior should be extended into telemetry so rate-limited external dependencies become observable events, not just local blocked states or logs.

  ## Desired Outcome
  - When any downstream service used by Vigilante returns a rate-limit or quota-related error, Vigilante emits an OTEL error event describing that failure in a bounded, privacy-safe way.
  - The event should make it possible to identify:
  - which service category was rate limiting Vigilante
  - what high-level operation was being attempted
  - whether the failure was classified as retryable/transient vs quota-related/blocking
  - The emitted telemetry should avoid leaking sensitive request content, prompts, tokens, raw arguments, or other free-form payloads.
  - Non-goals:
  - sending raw API responses or full stderr/stdout bodies to OTEL
  - building a full retry/backoff policy in the same change unless necessary for instrumentation correctness

  ## Implementation Notes
  - Treat this as a feature request for operational observability.
  - Required behavior:
  - detect rate-limit/quota-style failures from the services Vigilante uses
  - emit an OTEL error event when those failures occur
  - keep emitted properties bounded and privacy-safe
  - Plausible implementation areas:
  - `internal/telemetry/telemetry.go` for the event helper/schema
  - failure-classification paths in `internal/blocking`, `internal/app`, and `internal/runner`
  - shared command/service execution layers where service-specific errors are normalized
  - Required constraints:
  - do not emit raw prompts, tokens, repo-private payloads, or full free-form error text when it may contain sensitive information
  - keep the event taxonomy consistent with existing OTEL/workflow telemetry
  - cover at least the known quota/rate-limit failure shapes already classified in the codebase
  - Flexible details:
  - whether the signal is emitted as an OTEL log record, a workflow analytics event, or both
  - whether service names are normalized into categories such as `github`, `provider`, `telemetry_export`, etc.
  - Tradeoffs to consider:
  - generic runner-level detection provides broad coverage, but service-specific classification may be needed to avoid false positives
  - call-site emission can attach better context, but it is easier to miss future rate-limit paths

  ## Acceptance Criteria
  - [ ] Vigilante emits an OTEL error event when a downstream service returns a rate-limit or quota-related error.
  - [ ] The event includes bounded metadata identifying the affected service/category, the high-level operation, and the outcome/classification.
  - [ ] Known provider quota/rate-limit failures are covered by the new telemetry signal.
  - [ ] Sensitive raw payloads such as prompts, tokens, raw request bodies, and unrestricted stderr/stdout are not emitted.
  - [ ] The emitted telemetry is consistent enough to support dashboarding or alerting around downstream throttling.

  ## Testing Expectations
  - Add tests for detection and telemetry emission of provider quota/rate-limit failures.
  - Add tests for at least one non-provider downstream rate-limit scenario when applicable, such as GitHub/API throttling.
  - Add negative tests proving that sensitive raw payloads are not emitted in the telemetry event.
  - Add coverage for the bounded event fields/schema so future changes do not silently break observability.

  ## Operational / UX Considerations
  - These events should make downstream throttling visible without requiring operators to infer the pattern from scattered blocked sessions or logs.
  - Keep the event names and fields stable enough for dashboards and alerts.
  - If multiple services can rate limit Vigilante, normalize service categories so aggregate analysis remains useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit OTEL error events when any downstream service it depends on returns a rate-limiting or quota-related failure #256

Summary

Problem

Context

Desired Outcome

Implementation Notes

Acceptance Criteria

Testing Expectations

Operational / UX Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Emit OTEL error events when any downstream service it depends on returns a rate-limiting or quota-related failure #256

Description

Summary

Problem

Context

Desired Outcome

Implementation Notes

Acceptance Criteria

Testing Expectations

Operational / UX Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions