Azure Blob Storage data access extension#52272
Azure Blob Storage data access extension#52272antonslutskyms wants to merge 17 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds a new The implementation is generally well-written: config resolution handles all supported auth modes (connection string, shared key, sovereign clouds) with proper env-var fallbacks and secret normalisation; error responses distinguish container-not-found / blob-not-found from generic failures; size caps and result caps are enforced consistently. Two minor issues found:
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: extensions/azure-blob/src/blob-client.ts
Line: 163-178
Comment:
**Stream not destroyed on exact-capacity fill**
When a chunk's length equals exactly the remaining space (`buf.length === space`), the code takes the `if` branch, appends the full chunk, sets `total === maxBytes`, and then breaks via `if (total >= maxBytes)` — without calling `stream.destroy()`. This leaves the underlying Azure SDK stream open until it's garbage-collected, which can hold a live HTTP connection longer than necessary.
The `stream.destroy()` call in the `else` branch handles the over-capacity case correctly, but the exact-fill case is missed. Consider consolidating the check:
```suggestion
const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk as Uint8Array);
const space = maxBytes - total;
if (buf.length <= space) {
chunks.push(buf);
total += buf.length;
} else {
chunks.push(buf.subarray(0, space));
total = maxBytes;
stream.destroy();
break;
}
if (total >= maxBytes) {
stream.destroy();
break;
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: extensions/azure-blob/src/blob-read-tool.ts
Line: 47-54
Comment:
**Empty `blobName` not validated after trim**
`readStringParam(..., { required: true })` guards against `undefined`/`null`, but a caller passing `blobName: ""` or `blobName: " "` would silently reach the Azure SDK with a blank name. The Azure SDK will return a generic 400/404 error rather than the clearer in-tool message you produce elsewhere. A brief guard matches the `containerName` pattern used just above it:
```suggestion
const blobNameTrimmed = blobName.trim();
if (!blobNameTrimmed) {
return jsonResult({ ok: false, error: "blobName must not be empty." });
}
```
Then use `blobNameTrimmed` in place of `blobName.trim()` for the rest of the function. The TypeBox schema enforces the field is present but doesn't set `minLength: 1`, so the runtime guard is the only safety net here.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "adding azure blob storage extension" | Re-trigger Greptile |
| } | ||
| } | ||
|
|
||
| async function readStreamToBufferMax(stream: Readable, maxBytes: number): Promise<Buffer> { | ||
| const chunks: Buffer[] = []; | ||
| let total = 0; | ||
| for await (const chunk of stream) { | ||
| const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk as Uint8Array); | ||
| const space = maxBytes - total; | ||
| if (buf.length <= space) { | ||
| chunks.push(buf); | ||
| total += buf.length; | ||
| } else { | ||
| chunks.push(buf.subarray(0, space)); | ||
| total = maxBytes; | ||
| stream.destroy(); |
There was a problem hiding this comment.
Stream not destroyed on exact-capacity fill
When a chunk's length equals exactly the remaining space (buf.length === space), the code takes the if branch, appends the full chunk, sets total === maxBytes, and then breaks via if (total >= maxBytes) — without calling stream.destroy(). This leaves the underlying Azure SDK stream open until it's garbage-collected, which can hold a live HTTP connection longer than necessary.
The stream.destroy() call in the else branch handles the over-capacity case correctly, but the exact-fill case is missed. Consider consolidating the check:
| } | |
| } | |
| async function readStreamToBufferMax(stream: Readable, maxBytes: number): Promise<Buffer> { | |
| const chunks: Buffer[] = []; | |
| let total = 0; | |
| for await (const chunk of stream) { | |
| const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk as Uint8Array); | |
| const space = maxBytes - total; | |
| if (buf.length <= space) { | |
| chunks.push(buf); | |
| total += buf.length; | |
| } else { | |
| chunks.push(buf.subarray(0, space)); | |
| total = maxBytes; | |
| stream.destroy(); | |
| const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk as Uint8Array); | |
| const space = maxBytes - total; | |
| if (buf.length <= space) { | |
| chunks.push(buf); | |
| total += buf.length; | |
| } else { | |
| chunks.push(buf.subarray(0, space)); | |
| total = maxBytes; | |
| stream.destroy(); | |
| break; | |
| } | |
| if (total >= maxBytes) { | |
| stream.destroy(); | |
| break; | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/azure-blob/src/blob-client.ts
Line: 163-178
Comment:
**Stream not destroyed on exact-capacity fill**
When a chunk's length equals exactly the remaining space (`buf.length === space`), the code takes the `if` branch, appends the full chunk, sets `total === maxBytes`, and then breaks via `if (total >= maxBytes)` — without calling `stream.destroy()`. This leaves the underlying Azure SDK stream open until it's garbage-collected, which can hold a live HTTP connection longer than necessary.
The `stream.destroy()` call in the `else` branch handles the over-capacity case correctly, but the exact-fill case is missed. Consider consolidating the check:
```suggestion
const buf = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk as Uint8Array);
const space = maxBytes - total;
if (buf.length <= space) {
chunks.push(buf);
total += buf.length;
} else {
chunks.push(buf.subarray(0, space));
total = maxBytes;
stream.destroy();
break;
}
if (total >= maxBytes) {
stream.destroy();
break;
}
```
How can I resolve this? If you propose a fix, please make it concise.| return jsonResult({ | ||
| ok: false, | ||
| error: | ||
| "containerName is required unless defaultContainer is set in plugins.entries.azure-blob.config or AZURE_STORAGE_DEFAULT_CONTAINER.", | ||
| }); | ||
| } | ||
|
|
||
| const maxBytes = clampMaxBytes(readNumberParam(rawParams, "maxBytes", { integer: true })); |
There was a problem hiding this comment.
Empty
blobName not validated after trim
readStringParam(..., { required: true }) guards against undefined/null, but a caller passing blobName: "" or blobName: " " would silently reach the Azure SDK with a blank name. The Azure SDK will return a generic 400/404 error rather than the clearer in-tool message you produce elsewhere. A brief guard matches the containerName pattern used just above it:
| return jsonResult({ | |
| ok: false, | |
| error: | |
| "containerName is required unless defaultContainer is set in plugins.entries.azure-blob.config or AZURE_STORAGE_DEFAULT_CONTAINER.", | |
| }); | |
| } | |
| const maxBytes = clampMaxBytes(readNumberParam(rawParams, "maxBytes", { integer: true })); | |
| const blobNameTrimmed = blobName.trim(); | |
| if (!blobNameTrimmed) { | |
| return jsonResult({ ok: false, error: "blobName must not be empty." }); | |
| } |
Then use blobNameTrimmed in place of blobName.trim() for the rest of the function. The TypeBox schema enforces the field is present but doesn't set minLength: 1, so the runtime guard is the only safety net here.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/azure-blob/src/blob-read-tool.ts
Line: 47-54
Comment:
**Empty `blobName` not validated after trim**
`readStringParam(..., { required: true })` guards against `undefined`/`null`, but a caller passing `blobName: ""` or `blobName: " "` would silently reach the Azure SDK with a blank name. The Azure SDK will return a generic 400/404 error rather than the clearer in-tool message you produce elsewhere. A brief guard matches the `containerName` pattern used just above it:
```suggestion
const blobNameTrimmed = blobName.trim();
if (!blobNameTrimmed) {
return jsonResult({ ok: false, error: "blobName must not be empty." });
}
```
Then use `blobNameTrimmed` in place of `blobName.trim()` for the rest of the function. The TypeBox schema enforces the field is present but doesn't set `minLength: 1`, so the runtime guard is the only safety net here.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 175ebc4bf8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "dependencies": { | ||
| "@azure/storage-blob": "^12.31.0", | ||
| "@sinclair/typebox": "0.34.48" |
There was a problem hiding this comment.
Update pnpm-lock.yaml for the new workspace dependencies
Adding @azure/storage-blob here without updating pnpm-lock.yaml leaves the workspace out of sync. On a clean checkout, pnpm install --frozen-lockfile will reject the repo because the new extensions/azure-blob importer and direct dependency entries are missing from the lockfile; the current lockfile has no azure-blob or @azure/storage-blob entry at all.
Useful? React with 👍 / 👎.
| const iter = container.listBlobsFlat({ | ||
| ...(namePrefix ? { prefix: namePrefix } : {}), | ||
| }); |
There was a problem hiding this comment.
Honor maxResults at the Azure API page boundary
For large containers, this still fetches Azure's default first page before stopping locally at maxResults. The SDK's list APIs default to pages of up to 5000 items, so a request like maxResults: 1 can still transfer thousands of blob entries and make the tool slow or hit timeouts. Use .byPage({ maxPageSize: max }) here (and in listContainers) so the server-side work matches the advertised cap.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 466f0c51af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "openclaw": { | ||
| "extensions": [ | ||
| "./index.ts" | ||
| ] |
There was a problem hiding this comment.
Stage azure-blob runtime deps in bundled builds
@azure/storage-blob is introduced as a runtime dependency here, but this package never opts into openclaw.bundle.stageRuntimeDependencies. In packaged OpenClaw installs, bundled plugins are loaded from dist-runtime/extensions or dist/extensions (src/plugins/bundled-dir.ts:15-48), tsdown keeps dependencies external by default (tsdown.config.ts:194-202), and scripts/stage-bundled-plugin-runtime-deps.mjs:34-68 only installs plugin-local node_modules when that flag is true. Because the root package does not ship @azure/storage-blob (package.json:676-706), the Azure Blob plugin will fail to import with MODULE_NOT_FOUND anywhere outside a source checkout.
Useful? React with 👍 / 👎.
| "Download and return the contents of a blob from Azure Blob Storage. Requires connection string or account name/key (see plugin config / env vars). Opt-in tool.", | ||
| parameters: AzureBlobReadToolSchema, | ||
| execute: async (_toolCallId: string, rawParams: Record<string, unknown>) => { | ||
| const blobName = readStringParam(rawParams, "blobName", { required: true }); |
There was a problem hiding this comment.
Preserve significant whitespace in blob names
blobName is parsed with the default readStringParam behavior and then trimmed again before the read call. Azure blob names are allowed to contain arbitrary characters, so leading/trailing spaces are valid identifiers; in that case azure_blob_list_blobs can surface the exact name from Azure, but azure_blob_read will strip the whitespace and either read the wrong blob or return Blob not found. This makes any container that uses space-padded blob keys unreadable through the new tool.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 305a5295d4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const connectionString = resolveAzureBlobConnectionString(cfg); | ||
| if (connectionString) { | ||
| return BlobServiceClient.fromConnectionString(connectionString); |
There was a problem hiding this comment.
Handle malformed Azure connection strings before client creation
If AZURE_STORAGE_CONNECTION_STRING or plugins.entries.azure-blob.config.connectionString contains a typo, BlobServiceClient.fromConnectionString() throws synchronously. Because this call happens before the try blocks in listBlobContainers, listBlobsInContainer, and readBlobBytes, the exception escapes the normal { ok: false, error } path and turns every Azure tool call into an unhandled failure instead of a user-facing config error. Wrapping client construction in the same error normalization used for request failures would keep bad secrets from crashing the tool.
Useful? React with 👍 / 👎.
| const max = clampListMaxResults(params.maxResults); | ||
| const namePrefix = typeof params.prefix === "string" ? params.prefix.trim() : ""; | ||
| const container = client.getContainerClient(params.containerName.trim()); |
There was a problem hiding this comment.
Preserve significant whitespace in blob-prefix filters
Blob prefixes are matched against blob names verbatim, and Azure blob names can legally contain leading or trailing spaces. Trimming params.prefix here means azure_blob_list_blobs cannot accurately filter blobs like " reports/q1.json" or "foo "; it will query for a different prefix and either miss the target blob or return the wrong set. This is a separate regression from the azure_blob_read name trimming because it breaks discovery as well as reads.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b510daa81
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (status === "404" || code === "BlobNotFound") { | ||
| return { | ||
| ok: false, | ||
| message: `Blob not found: ${params.containerName}/${params.blobName}`, | ||
| }; |
There was a problem hiding this comment.
Distinguish missing containers from missing blobs in reads
When azure_blob_read is pointed at a container that does not exist, Azure returns code: "ContainerNotFound" with HTTP 404. This branch matches any 404 before checking the error code, so the tool reports Blob not found: <container>/<blob> even though the container itself is wrong. That sends users toward the wrong fix and makes bad defaultContainer / containerName config look like a missing blob instead of a missing container.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d3104e8d08
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "name": "@openclaw/azure-blob", | ||
| "version": "2026.3.14", | ||
| "private": true, |
There was a problem hiding this comment.
Add
.github/labeler.yml coverage for azure-blob
AGENTS.md requires new extensions to update .github/labeler.yml and create a matching label. This commit adds extensions/azure-blob/**, but .github/labeler.yml has no extensions: azure-blob rule, so future PRs that only touch this plugin will miss the repo’s extension-label triage/review routing that existing plugins already get.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: edcdb0a47d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const connectionString = resolveAzureBlobConnectionString(cfg); | ||
| if (connectionString) { | ||
| return BlobServiceClient.fromConnectionString(connectionString); |
There was a problem hiding this comment.
Honor explicit accountName/accountKey over env connection strings
If the operator configures plugins.entries.azure-blob.config.accountName/accountKey as documented, but the gateway process also inherits AZURE_STORAGE_CONNECTION_STRING for some other storage account, this branch always picks the env connection string and never reaches shared-key auth. In that multi-Azure-host scenario the plugin will query the wrong account or fail authentication, so the advertised account-name/key setup is effectively unusable unless the user unsets the global env var first.
Useful? React with 👍 / 👎.
| 1. **Load** this extension from disk (`plugins.load.paths`). | ||
| 2. **Allow** the plugin id if you use a global plugin allowlist (`plugins.allow`). | ||
| 3. **Enable** it and pass **credentials** (`plugins.entries.azure-blob`). | ||
| 4. **Expose** the optional tools via `tools.alsoAllow` (and `tools.sandbox.tools.allow` when sessions run sandboxed). | ||
|
|
||
| Use the **absolute path** to **this directory** (the folder that contains `openclaw.plugin.json`) on the **same machine** as the Gateway process. Do **not** point at `index.ts` — use the directory path to avoid plugin id mismatch warnings. |
There was a problem hiding this comment.
Remove the mandatory load-path step from bundled setup docs
This plugin is bundled and auto-discovered from the stock bundled extensions root, so requiring plugins.load.paths here sends normal npm/mac-app users to a repo path they do not have. On those installs, following these instructions makes Azure Blob setup look impossible even though enabling plugins.entries.azure-blob is sufficient, which is a user-facing setup regression for the new feature.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5284bede5f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (typeof contentLength === "number" && Number.isFinite(contentLength) && contentLength >= 0) { | ||
| const n = Math.min(maxBytes, Math.floor(contentLength)); | ||
| data = n === 0 ? Buffer.alloc(0) : await blob.downloadToBuffer(0, n); | ||
| } else { |
There was a problem hiding this comment.
Stop using stale blob properties to size the download
readBlobBytes() uses the contentLength fetched by getProperties() a few lines earlier to decide how many bytes to request here. If the blob is appended to or overwritten between those two requests—a common case for log/append blobs—n is stale: a blob that grew will be under-read and reported as truncated: false, while a blob that shrank can fail with a spurious range error. This makes reads from actively changing blobs return incomplete or incorrect data even when maxBytes would allow the current contents.
Useful? React with 👍 / 👎.
Summary
Describe the problem and fix in 2–5 bullets:
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
List user-visible changes (including defaults/config).
If none, write
None.Security Impact (required)
Yes) -- Access to Azure Blob Storage. Risk is data access, mitigation through Azure RBACNo)Yes) -- Openclaw needs network access to access the Azure Blob Storage network endpoint. Risk is open access to network resource, mitigation through openshell or other network restrictions to specific ports/ip rangesNo)Yes) -- Openclaw can access network storage resource. Risk is access to sensitive business data, mitigation through RBAC controlsYes, explain risk + mitigation:Repro + Verification
Environment
Steps
Expected
Here are the hotel prices from your Azure Blob Storage:
Actual
Here are the hotel prices from your Azure Blob Storage:
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
Verified result was:
Here are the hotel prices from your Azure Blob Storage:
Budget Inn — $75 per night
Midrange Suites — $120 per night
Luxury Penthouse — $250 per night
Boutique Hotel — $180 per night
Edge cases checked:
No data matches the query, such as "find the airline ticket prices in my blob storage". Result is
"
I checked your blob storage container (container1). Right now it only contains one file:
hotel_prices.json
There aren’t any files related to airline ticket prices, so I can’t find that data in the storage at the moment. If you upload a file with the airline prices (for example airline_prices.json), I can read it and list them for you.
"
Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes)Yes)No)Please see README.md in extensions/azure-blob
Failure Recovery (if this breaks)
Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.