docs: add Azure Container Apps install guide with managed identity an…#52555
docs: add Azure Container Apps install guide with managed identity an…#52555kimvaddi wants to merge 4 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds a new Azure Container Apps installation guide ( Key findings from the review:
Confidence Score: 3/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: docs/install/azure-containers.md
Line: 181-218
Comment:
**ACR role assignment after first image pull — initial deployment will fail**
Steps 8 and 9 have a chicken-and-egg ordering problem. Step 8 creates the container app with `--registry-identity system`, which immediately schedules a container start and causes Azure to attempt an ACR image pull using the not-yet-authorized managed identity. The `AcrPull` role is only assigned in Step 9 — after the pull has already been attempted and failed.
The container app resource will be created successfully (exit code 0), but the underlying container will enter a failed/crashed state with an authorization error. The user will be left with a non-running gateway and no indication of why.
**Fix**: After the role assignment in Step 9, document that a new revision must be triggered to retry the (now-authorized) image pull:
```bash
# After granting AcrPull in Step 9, force a new revision:
az containerapp update \
-g "${RG}" -n "${ACA_APP}" \
--image "${ACR_NAME}.azurecr.io/openclaw:latest"
```
Alternatively, restructure the steps to use a user-assigned managed identity (pre-created and granted `AcrPull` before the container app is created), which avoids the ordering problem entirely.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: docs/install/azure-containers.md
Line: 291
Comment:
**Missing `-g` flag on `az keyvault show`**
This command omits the resource group flag, unlike every other `az` command in the guide. While Key Vault names are globally unique within a subscription, omitting `-g` can cause unexpected failures when users have multiple subscriptions active or when Azure CLI subscription defaults are not set correctly. Adding `-g "${RG}"` after `az keyvault show` keeps the command consistent with the rest of the guide and avoids ambiguity.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "docs: add Azure Container Apps install g..." | Re-trigger Greptile |
docs/install/azure-containers.md
Outdated
| --registry-identity system \ | ||
| --system-assigned \ | ||
| --target-port 18789 \ | ||
| --ingress external \ | ||
| --min-replicas 1 --max-replicas 1 \ | ||
| --cpu 0.5 --memory 1Gi \ | ||
| --env-vars \ | ||
| "OPENCLAW_GATEWAY_PORT=18789" \ | ||
| "OPENCLAW_HOME=/data/.openclaw" \ | ||
| --args "gateway" "run" "--bind" "all" "--port" "18789" | ||
| ``` | ||
|
|
||
| `--registry-identity system` tells Container Apps to pull images using the app's managed identity instead of ACR admin credentials. No passwords to manage or rotate. | ||
|
|
||
| <Note> | ||
| Set `--min-replicas 1` to keep the Gateway always running. Scaling to 0 stops the Gateway. | ||
| OpenClaw is a single-instance gateway — do not scale above 1 replica. | ||
| Using 0.5 vCPU / 1 GiB keeps costs low. Scale up to `--cpu 1.0 --memory 2Gi` if you hit OOMs or need more concurrency. | ||
| </Note> | ||
|
|
||
| </Step> | ||
|
|
||
| <Step title="Grant ACR pull permission to the managed identity"> | ||
| ```bash | ||
| IDENTITY_PRINCIPAL="$(az containerapp show -g "${RG}" -n "${ACA_APP}" \ | ||
| --query identity.principalId -o tsv)" | ||
|
|
||
| ACR_ID="$(az acr show -g "${RG}" -n "${ACR_NAME}" --query id -o tsv)" | ||
|
|
||
| az role assignment create \ | ||
| --assignee "${IDENTITY_PRINCIPAL}" \ | ||
| --role AcrPull \ | ||
| --scope "${ACR_ID}" | ||
| ``` | ||
|
|
||
| </Step> | ||
|
|
||
| <Step title="Grant Key Vault access to the managed identity"> |
There was a problem hiding this comment.
ACR role assignment after first image pull — initial deployment will fail
Steps 8 and 9 have a chicken-and-egg ordering problem. Step 8 creates the container app with --registry-identity system, which immediately schedules a container start and causes Azure to attempt an ACR image pull using the not-yet-authorized managed identity. The AcrPull role is only assigned in Step 9 — after the pull has already been attempted and failed.
The container app resource will be created successfully (exit code 0), but the underlying container will enter a failed/crashed state with an authorization error. The user will be left with a non-running gateway and no indication of why.
Fix: After the role assignment in Step 9, document that a new revision must be triggered to retry the (now-authorized) image pull:
# After granting AcrPull in Step 9, force a new revision:
az containerapp update \
-g "${RG}" -n "${ACA_APP}" \
--image "${ACR_NAME}.azurecr.io/openclaw:latest"Alternatively, restructure the steps to use a user-assigned managed identity (pre-created and granted AcrPull before the container app is created), which avoids the ordering problem entirely.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/install/azure-containers.md
Line: 181-218
Comment:
**ACR role assignment after first image pull — initial deployment will fail**
Steps 8 and 9 have a chicken-and-egg ordering problem. Step 8 creates the container app with `--registry-identity system`, which immediately schedules a container start and causes Azure to attempt an ACR image pull using the not-yet-authorized managed identity. The `AcrPull` role is only assigned in Step 9 — after the pull has already been attempted and failed.
The container app resource will be created successfully (exit code 0), but the underlying container will enter a failed/crashed state with an authorization error. The user will be left with a non-running gateway and no indication of why.
**Fix**: After the role assignment in Step 9, document that a new revision must be triggered to retry the (now-authorized) image pull:
```bash
# After granting AcrPull in Step 9, force a new revision:
az containerapp update \
-g "${RG}" -n "${ACA_APP}" \
--image "${ACR_NAME}.azurecr.io/openclaw:latest"
```
Alternatively, restructure the steps to use a user-assigned managed identity (pre-created and granted `AcrPull` before the container app is created), which avoids the ordering problem entirely.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in c00e084. Renamed the step to "Grant ACR pull permission and restart the container", added an explanatory note that the initial pull will fail, and added az containerapp update --image ... after the role assignment to force a new revision that retries the (now-authorized) image pull.
Fixed in c00e084. Added -g "${RG}" to az keyvault show for consistency with the rest of the guide.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 209af79141
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
docs/install/azure-containers.md
Outdated
| --target-port 18789 \ | ||
| --ingress external \ | ||
| --min-replicas 1 --max-replicas 1 \ | ||
| --cpu 0.5 --memory 1Gi \ | ||
| --env-vars \ | ||
| "OPENCLAW_GATEWAY_PORT=18789" \ | ||
| "OPENCLAW_HOME=/data/.openclaw" \ | ||
| --args "gateway" "run" "--bind" "all" "--port" "18789" |
There was a problem hiding this comment.
Set Control UI allowed origins before enabling external ingress
Even after fixing the bind value, this revision still won't start a public Control UI. src/gateway/server-runtime-config.ts:139-146 rejects any non-loopback Control UI unless gateway.controlUi.allowedOrigins (or the dangerous host-header fallback) is configured. This guide enables external ingress and then tells readers to open the ACA FQDN, but it never writes that FQDN into the gateway config, so the gateway process exits before the UI becomes reachable.
Useful? React with 👍 / 👎.
Summary
Describe the problem and fix in 2–5 bullets:
docs/install/azure-containers.mdwith managed identity ACR pull, Key Vault secrets via RBAC, GHCR skip-ACR alternative, persistent Azure Files storage, and nav/redirect/hub-page wiring indocs.json,vps.md, andplatforms/index.md.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
/install/azure-containers./azure-containersand/platforms/azure-containersboth resolve to/install/azure-containers.Security Impact (required)
No)No)No)No)No)Yes, explain risk + mitigation:N/ARepro + Verification
Environment
Steps
main...docs/azure-container-apps-install-guide.docs/install/azure-containers.mdexists with managed identity, Key Vault, and GHCR sections.docs/docs.jsonhas nav entry under Hosting and both redirects (/azure-containers,/platforms/azure-containers).docs/vps.mdhas Azure Container Apps card anddocs/platforms/index.mdhas the link.az containerapp execrunsopenclaw gateway statussuccessfully.Expected
/install/azure-containersand via redirects.Actual
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
main...HEADdiff and file-level changes.docs/docs.jsonredirects and nav entries for Azure Container Apps.oxfmt --checkon all changed files — format is clean.--registry-identity system), Key Vault RBAC, and GHCR fallback flows are documented correctly./azure-containersand/platforms/azure-containers.vps.mdandplatforms/index.mdresolve correctly.Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes)No)No)N/AFailure Recovery (if this breaks)
docs/docs.json,docs/vps.md,docs/platforms/index.md, and removedocs/install/azure-containers.md./install/azure-containers.Risks and Mitigations
None— docs-only PR with no runtime code changes.