Skip to content

Conversation

@joshcoughlan
Copy link

@joshcoughlan joshcoughlan commented Nov 17, 2025

This PR migrates the Azure Blob sink to the latest official Azure SDK crates and aligns our internal auth/HTTP plumbing with the new pipeline model.

Highlights:

  • Replace legacy crates with official SDK:
    • azure_storage_blob → 0.7.x
    • azure_core → 0.30.x
    • Remove azure_storage_blobs/azure_storage/azure_core_for_storage
  • Preserve existing auth modes:
    • SAS via URL stays supported
    • Access key via a custom Shared Key Policy integrated into the azure_core pipeline
  • Add a minimal connection string parser to support SAS and Shared Key without pulling in legacy crates
  • Keep the rest of Vector on reqwest 0.11.26 while the Azure SDK uses reqwest 0.12.24:
    • Introduce an aliased reqwest_0_12_24 dependency and inject it as the Azure transport
    • Avoid rippling reqwest upgrades through non-Azure code paths

Follow-ups planned:

Vector configuration

Examples used during testing.

SAS-based connection string:

[sinks.azure_blob]
type = "azure_blob"
inputs = ["in"]
container_name = "vector-logs"
# Example SAS (redacted); must include the leading '?'
connection_string = "BlobEndpoint=https://<account>.blob.core.windows.net;SharedAccessSignature=?sv=...&ss=b&srt=...&sp=...&se=...&st=...&spr=https&sig=...;"
compression = "gzip"
encoding.codec = "json"

Shared Key (access key) connection string:

[sinks.azure_blob]
type = "azure_blob"
inputs = ["in"]
container_name = "vector-logs"
connection_string = "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<base64key>;EndpointSuffix=core.windows.net"
compression = "gzip"
encoding.codec = "json"
# Uses the custom SharedKeyAuthorizationPolicy added in this PR

How did you test this PR?

  • Local builds for all supported targets, including cross builds in a containerized environment
  • Tested in a live environment to confirm functionality

Change Type

  • Bug fix
  • [ X] New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • [X ] No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Nov 17, 2025
@joshcoughlan joshcoughlan marked this pull request as ready for review November 17, 2025 18:18
@joshcoughlan joshcoughlan requested a review from a team as a code owner November 17, 2025 18:18
Copy link
Contributor

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! I haven't had the time to look closely at everything yet, but I'm leaving some comments now because there is a lot to discuss.

As per your PR description it seems like we're now supporting new auth methods with this upgrade too, so it would be nice to include a changelog in this PR. Let me know if I misunderstood this.

I do have some concerns about using these new azure crates though. The azure_storage_blob crate has this warning:

Note: The azure_storage_blob crate is currently under active development and not all features may be implemented or work as intended. This crate is in beta and not suitable for Production environments. For any general feedback or usage issues, please open a GitHub issue

I see that we are using just some parts of that crate. Namely only BlobContainerClient/BlobContainerClientOptions, clients::BlobClient and models::BlockBlobClientUploadOptions.

Have you checked if they are just a port from what was being used in the old azure_storage_blobs crate or is it a full rewrite? If it is a port I'd have more confidence in using it. Either way I think our hands are tied here because MS doesn't seem to put enough effort into releasing good and stable Azure crates so anything as good as or better than the old azure_storage_blobs will do.

@joshcoughlan
Copy link
Author

Have you checked if they are just a port from what was being used in the old azure_storage_blobs crate or is it a full rewrite? If it is a port I'd have more confidence in using it. Either way I think our hands are tied here because MS doesn't seem to put enough effort into releasing good and stable Azure crates so anything as good as or better than the old azure_storage_blobs will do.

I haven't, but I suspect it is a full rewrite since I think it is just adding support for auth from a url to the SDK. I wrote a fix for the SDK and submitted it to Microsoft, but was informed that they had fixed what I was trying to fix in a release of azure_storage_blob the day before.

Basically, clients now have a from_url method that allows you to pass a full URL with a SAS. This design closely matches our other SDKs and so it what we wanted to go with for Rust as well. Please give it a try and let us know if you have any feedback.
Azure/azure-sdk-for-rust#3329 (comment)

If I'm remembering properly, the only thing keeping SAS from working before is that the client constructor required the AD auth method. I think now it is making that credential optional, which then opened the door for auth from url.
Azure/azure-sdk-for-rust#3176

Anyway, I went about refactoring my original Vector PR to use the newly released azure_storage_blob SDK and broke it up into two PRs. I'm mainly doing all of this because we need per-sink proxy support and the legacy crates just didn't really support that so much. I have another PR that builds on this one to add proxy support: #24256

As per your PR description it seems like we're now supporting new auth methods with this upgrade too, so it would be nice to include a changelog in this PR. Let me know if I misunderstood this.

Same auth methods, SAS and Access key, are supported, but I think the implementation is different. I think this PR should help address #24167 though, so it can lead to additional auth methods.

@pront pront added provider: azure Anything `azure` service provider related sink: azure_blob Anything `azure_blob` sink related labels Nov 17, 2025
@thomasqueirozb thomasqueirozb added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Nov 18, 2025
@thomasqueirozb
Copy link
Contributor

thomasqueirozb commented Nov 21, 2025

Same auth methods

Changelog not needed then

There are a few failing checks (and merge conflicts), I'll circle back to this once there are new commits on this branch.

@fplantinga-guida
Copy link

Any updates on this? @thomasqueirozb @joshcoughlan

@joshcoughlan
Copy link
Author

Any updates on this? @thomasqueirozb @joshcoughlan

I'm working through this today getting the clippy whatnot passing and resolving merge conflicts. I didn't really realize there were other PRs waiting on this.

Also, full disclosure here, I'd be happy if more eyeballs were on this PR as a good amount of it is faith-based coding with AI guidance. I'm none too familiar with rust and having more knowledgeable humans involved wouldn't be a bad thing.

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@github-actions github-actions bot removed the meta: awaiting author Pull requests that are awaiting their author. label Jan 7, 2026
@joshcoughlan joshcoughlan changed the title feat(azure_blob): Update crates and migrate to the new SDK feat(azure_blob sink): Update crates and migrate to the new SDK Jan 8, 2026
@joshcoughlan
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

@joshcoughlan
Copy link
Author

recheck

# Azure
azure_core = { version = "0.25", default-features = false, features = ["reqwest", "hmac_openssl"], optional = true }
azure_identity = { version = "0.25", default-features = false, features = ["reqwest"], optional = true }
azure_core = { version = "0.30", default-features = false, features = ["reqwest", "hmac_openssl"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
azure_core = { version = "0.30", default-features = false, features = ["reqwest", "hmac_openssl"] }
azure_core = { version = "0.30", default-features = false, features = ["reqwest", "hmac_openssl"], optional = true }

This should remain optional as we don't want this to be compiled in when Vector is built with minimal features

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, looking at https://docs.rs/crate/azure_core/latest/features I'm worried that without reqwest_native_tls and maybe also without reqwest_deflate we can run into some issues...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sinks-aws_sns = ["aws-core", "dep:aws-sdk-sns"]
sinks-axiom = ["sinks-http"]
sinks-azure_blob = ["dep:azure_core", "dep:azure_identity", "dep:azure_storage", "dep:azure_storage_blobs"]
sinks-azure_blob = ["dep:azure_storage_blob"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sinks-azure_blob = ["dep:azure_storage_blob"]
sinks-azure_blob = ["dep:azure_core", "dep:azure_storage_blob"]

Refer to earlier comment

assert!(blobs[0].clone().ends_with(".log"));
let (blob, blob_lines) = config.get_blob(blobs[0].clone()).await;
assert_eq!(blob.properties.content_type, String::from("text/plain"));
let (content_type, _content_encoding, blob_lines) = config.get_blob(blobs[0].clone()).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume that the encoding here is None right?

Suggested change
let (content_type, _content_encoding, blob_lines) = config.get_blob(blobs[0].clone()).await;
let (content_type, content_encoding, blob_lines) = config.get_blob(blobs[0].clone()).await;
assert_eq!(content_encoding, None);

We can copy what is done in the following test

#[derive(Debug)]
pub struct AzureBlobResponse {
pub inner: PutBlockBlobResponse,
pub inner: (),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub inner: (),

We should be able to remove this

Comment on lines +167 to +172
// Force Azure SDK to use reqwest_0_12_24 transport to avoid affecting global reqwest
options.client_options.transport = Some(azure_core::http::Transport::new(std::sync::Arc::new(
reqwest_0_12_24::ClientBuilder::new()
.build()
.map_err(|e| format!("Failed to build reqwest_0_12_24 client: {e}"))?,
)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using another version of reqwest here? I'd assume that this would work just fine if we used reqwest::ClientBuilder instead - or is there a version issue?

Comment on lines +209 to +210
// Sort values (optional but deterministic)
vals.sort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? (wouldn't surprise me if MS made it so...)

Comment on lines +4 to +6
This module intentionally avoids relying on the legacy Azure Storage SDK crates.
It extracts only the fields we need and composes container/blob URLs suitable
for the newer `azure_storage_blob` crate (>= 0.7).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't there any helper functions present in the legacy crates that can help here? While I don't see anything necessarily wrong with this file it'd be best if we avoid adding custom parsers and delegate this to external library code if possible. If not possible it's fine too :)

Comment on lines +63 to +64
result.map(|_resp| AzureBlobResponse {
inner: (),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct... OTOH I cannot find anywhere where inner is used (even in the current code base)...

@thomasqueirozb thomasqueirozb added the meta: awaiting author Pull requests that are awaiting their author. label Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: sinks Anything related to the Vector's sinks meta: awaiting author Pull requests that are awaiting their author. no-changelog Changes in this PR do not need user-facing explanations in the release changelog provider: azure Anything `azure` service provider related sink: azure_blob Anything `azure_blob` sink related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants