Skip to content

Phase 10: Fix dedicated ACA provisioning — env vars, retry logic, stuck orgs #48116

@aselkasidekyk

Description

@aselkasidekyk

Summary

Fix dedicated ACA provisioning so it actually works:

  1. Missing env vars on sidekyk-admin: AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZURE_ACR_LOGIN_SERVER, AZURE_WORKER_IMAGE — provisioner couldn't initialize
  2. Corrupted AZURE_ACA_ENVIRONMENT_ID: Git Bash mangled /subscriptions/...C:/Program Files/Git/subscriptions/...
  3. Incomplete baseWorkerEnv: Only passed OPENAI_API_KEY + DATABASE_URL — now passes WhatsApp, Service Bus, Redis, and admin API vars
  4. No retry logic: Provisioning was fire-and-forget with no retries — now 3 attempts with exponential backoff
  5. Stuck org recovery: New POST /api/admin/isolation/retry-stuck endpoint to retry all orgs stuck in error or provisioning state

Changes

  • admin/src/index.ts: Expanded baseWorkerEnv with all required dedicated worker env vars
  • admin/src/routes/users.ts: Added provisionWithRetry() with 3 retries, exponential backoff (5s, 10s, 20s)
  • admin/src/routes/isolation.ts: Added retry loop in PUT handler + POST /api/admin/isolation/retry-stuck endpoint
  • scripts/deploy-azure.sh: Added pass-through env vars for dedicated workers
  • Tests: 4 user tests + 12 isolation tests
  • Smoke tests updated for Phase 10

Acceptance Criteria

  • AZURE_SUBSCRIPTION_ID set on sidekyk-admin → AzureProvisioner initializes
  • baseWorkerEnv passes all required env vars to dedicated workers
  • Provisioning retries 3x with exponential backoff before setting error
  • POST /api/admin/isolation/retry-stuck recovers stuck orgs
  • All unit tests pass (58 admin + 123 shared)
  • Prod smoke tests pass

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions