-
Notifications
You must be signed in to change notification settings - Fork 2k
feat(azure_blob sink): Update crates and migrate to the new SDK #24255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
thomasqueirozb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution! I haven't had the time to look closely at everything yet, but I'm leaving some comments now because there is a lot to discuss.
As per your PR description it seems like we're now supporting new auth methods with this upgrade too, so it would be nice to include a changelog in this PR. Let me know if I misunderstood this.
I do have some concerns about using these new azure crates though. The azure_storage_blob crate has this warning:
Note: The azure_storage_blob crate is currently under active development and not all features may be implemented or work as intended. This crate is in beta and not suitable for Production environments. For any general feedback or usage issues, please open a GitHub issue
I see that we are using just some parts of that crate. Namely only BlobContainerClient/BlobContainerClientOptions, clients::BlobClient and models::BlockBlobClientUploadOptions.
Have you checked if they are just a port from what was being used in the old azure_storage_blobs crate or is it a full rewrite? If it is a port I'd have more confidence in using it. Either way I think our hands are tied here because MS doesn't seem to put enough effort into releasing good and stable Azure crates so anything as good as or better than the old azure_storage_blobs will do.
I haven't, but I suspect it is a full rewrite since I think it is just adding support for auth from a url to the SDK. I wrote a fix for the SDK and submitted it to Microsoft, but was informed that they had fixed what I was trying to fix in a release of azure_storage_blob the day before.
If I'm remembering properly, the only thing keeping SAS from working before is that the client constructor required the AD auth method. I think now it is making that credential optional, which then opened the door for auth from url. Anyway, I went about refactoring my original Vector PR to use the newly released azure_storage_blob SDK and broke it up into two PRs. I'm mainly doing all of this because we need per-sink proxy support and the legacy crates just didn't really support that so much. I have another PR that builds on this one to add proxy support: #24256
Same auth methods, SAS and Access key, are supported, but I think the implementation is different. I think this PR should help address #24167 though, so it can lead to additional auth methods. |
Changelog not needed then There are a few failing checks (and merge conflicts), I'll circle back to this once there are new commits on this branch. |
|
Any updates on this? @thomasqueirozb @joshcoughlan |
I'm working through this today getting the clippy whatnot passing and resolving merge conflicts. I didn't really realize there were other PRs waiting on this. Also, full disclosure here, I'd be happy if more eyeballs were on this PR as a good amount of it is faith-based coding with AI guidance. I'm none too familiar with rust and having more knowledgeable humans involved wouldn't be a bad thing. |
|
All contributors have signed the CLA ✍️ ✅ |
3cd1ced to
7785983
Compare
|
I have read the CLA Document and I hereby sign the CLA |
|
recheck |
| # Azure | ||
| azure_core = { version = "0.25", default-features = false, features = ["reqwest", "hmac_openssl"], optional = true } | ||
| azure_identity = { version = "0.25", default-features = false, features = ["reqwest"], optional = true } | ||
| azure_core = { version = "0.30", default-features = false, features = ["reqwest", "hmac_openssl"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| azure_core = { version = "0.30", default-features = false, features = ["reqwest", "hmac_openssl"] } | |
| azure_core = { version = "0.30", default-features = false, features = ["reqwest", "hmac_openssl"], optional = true } |
This should remain optional as we don't want this to be compiled in when Vector is built with minimal features
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, looking at https://docs.rs/crate/azure_core/latest/features I'm worried that without reqwest_native_tls and maybe also without reqwest_deflate we can run into some issues...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW reqwest now sets its default TLS feature to use rustls, instead of native-tls.
| sinks-aws_sns = ["aws-core", "dep:aws-sdk-sns"] | ||
| sinks-axiom = ["sinks-http"] | ||
| sinks-azure_blob = ["dep:azure_core", "dep:azure_identity", "dep:azure_storage", "dep:azure_storage_blobs"] | ||
| sinks-azure_blob = ["dep:azure_storage_blob"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| sinks-azure_blob = ["dep:azure_storage_blob"] | |
| sinks-azure_blob = ["dep:azure_core", "dep:azure_storage_blob"] |
Refer to earlier comment
| assert!(blobs[0].clone().ends_with(".log")); | ||
| let (blob, blob_lines) = config.get_blob(blobs[0].clone()).await; | ||
| assert_eq!(blob.properties.content_type, String::from("text/plain")); | ||
| let (content_type, _content_encoding, blob_lines) = config.get_blob(blobs[0].clone()).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assume that the encoding here is None right?
| let (content_type, _content_encoding, blob_lines) = config.get_blob(blobs[0].clone()).await; | |
| let (content_type, content_encoding, blob_lines) = config.get_blob(blobs[0].clone()).await; | |
| assert_eq!(content_encoding, None); |
We can copy what is done in the following test
| #[derive(Debug)] | ||
| pub struct AzureBlobResponse { | ||
| pub inner: PutBlockBlobResponse, | ||
| pub inner: (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub inner: (), |
We should be able to remove this
| // Force Azure SDK to use reqwest_0_12_24 transport to avoid affecting global reqwest | ||
| options.client_options.transport = Some(azure_core::http::Transport::new(std::sync::Arc::new( | ||
| reqwest_0_12_24::ClientBuilder::new() | ||
| .build() | ||
| .map_err(|e| format!("Failed to build reqwest_0_12_24 client: {e}"))?, | ||
| ))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using another version of reqwest here? I'd assume that this would work just fine if we used reqwest::ClientBuilder instead - or is there a version issue?
| // Sort values (optional but deterministic) | ||
| vals.sort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? (wouldn't surprise me if MS made it so...)
| This module intentionally avoids relying on the legacy Azure Storage SDK crates. | ||
| It extracts only the fields we need and composes container/blob URLs suitable | ||
| for the newer `azure_storage_blob` crate (>= 0.7). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't there any helper functions present in the legacy crates that can help here? While I don't see anything necessarily wrong with this file it'd be best if we avoid adding custom parsers and delegate this to external library code if possible. If not possible it's fine too :)
| result.map(|_resp| AzureBlobResponse { | ||
| inner: (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct... OTOH I cannot find anywhere where inner is used (even in the current code base)...
This PR migrates the Azure Blob sink to the latest official Azure SDK crates and aligns our internal auth/HTTP plumbing with the new pipeline model.
Highlights:
Follow-ups planned:
azure_blobproxy support #21314)Vector configuration
Examples used during testing.
SAS-based connection string:
Shared Key (access key) connection string:
How did you test this PR?
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
ERROR vector::topology::builder: Configuration error. error=Sink "testing-azure_blob": Account name missing in connection string