[object_store] support azure managed and workload identities#3581
[object_store] support azure managed and workload identities#3581tustvold merged 9 commits intoapache:masterfrom
Conversation
|
I intend to review this first thing tomorrow |
| assert_eq!(req.uri().path(), format!("/{}/oauth2/v2.0/token", tenant)); | ||
| assert_eq!(req.method(), &Method::POST); |
There was a problem hiding this comment.
One thing I really wanted to check here is if the right content from the federated token file is included in the client_assertion. However getting the body bytes is an async operation. We could probably get a tokio runtime internally, but I was wondering if there is a better way to achieve this.
There was a problem hiding this comment.
You could use futures::block_on
tustvold
left a comment
There was a problem hiding this comment.
Mostly just some minor nits, looks good thank you 👍
| client_id: String, | ||
| client_secret: String, | ||
| tenant_id: String, | ||
| tenant_id: impl AsRef<str>, |
object_store/src/azure/credential.rs
Outdated
| /// Fetch a token | ||
| pub async fn fetch_token( | ||
| async fn fetch_token(&self, client: &Client, retry: &RetryConfig) -> Result<String> { | ||
| self.cache |
There was a problem hiding this comment.
What do you think about moving the TokenCache onto CredentialProvider::TokenProvider?
There was a problem hiding this comment.
makes absolute sense - done.
object_store/src/azure/credential.rs
Outdated
| #[derive(serde::Deserialize, Debug)] | ||
| #[async_trait::async_trait] | ||
| pub trait TokenCredential: std::fmt::Debug + Send + Sync + 'static { | ||
| async fn fetch_token(&self, client: &Client, retry: &RetryConfig) -> Result<String>; |
There was a problem hiding this comment.
I think the client should be passed at credential construction, this avoids issues with clients not permitting https
There was a problem hiding this comment.
I passed in a client for the managed identity, but I was unsure about the other credentials as they always use ssl. So we just ignore the client for the managed identity. If however the "savings" of not having a redundant instance of the client are not relevant, I'm happy to move it for all credentials.
object_store/src/azure/credential.rs
Outdated
| (None, None, Some(msi_res_id)) => { | ||
| query_items.push(("msi_res_id", msi_res_id)) | ||
| } | ||
| _ => (), |
There was a problem hiding this comment.
I think this silently ignores parameters if multiple are specified, I can't see why this would be the case. If this is intentional I think we should return an error for this case
There was a problem hiding this comment.
True - I went for just having a hierarchy. Since object id and msi_res_id are more specific (only used by managed identity) and less likely to just "end up" in the environment, I thought they would get precedence.
| .header("metadata", "true") | ||
| .query(&query_items); | ||
|
|
||
| if let Ok(val) = std::env::var(MSI_SECRET_ENV_KEY) { |
There was a problem hiding this comment.
I think it would be better to provide this in the constructor, and delegate to the config key system
There was a problem hiding this comment.
I was not sure about this value. From what I understood, its rotated at runtime.
There was a problem hiding this comment.
Unless I'm mistaken, environment variables can't reliably be altered at runtime by an external process. So I think it should be fine to move it outside
object_store/src/azure/credential.rs
Outdated
| let fs = LocalFileSystem::new_with_prefix(root.path()).unwrap(); | ||
| let tenant = "tenant"; | ||
| let tokenfile = root.path().join("tokenfile"); | ||
| fs.put( | ||
| &crate::path::Path::from("tokenfile"), | ||
| bytes::Bytes::from("federated-token"), | ||
| ) | ||
| .await | ||
| .unwrap(); |
There was a problem hiding this comment.
Why not just use NamedTempFile here?
| assert_eq!(req.uri().path(), format!("/{}/oauth2/v2.0/token", tenant)); | ||
| assert_eq!(req.method(), &Method::POST); |
There was a problem hiding this comment.
You could use futures::block_on
| msi_endpoint: Option<String>, | ||
| ) -> Self { | ||
| let msi_endpoint = msi_endpoint.unwrap_or_else(|| { | ||
| "http://169.254.169.254/metadata/identity/oauth2/token".to_owned() |
There was a problem hiding this comment.
A problem we had to workaround with the AWS provider, is ensuring that the client used for this is not https only. In practice this meant creating a separate client for it
There was a problem hiding this comment.
completely missed that - using a dedicated client here as well now.
object_store/src/azure/mod.rs
Outdated
| Ok(credential::CredentialProvider::AccessKey(bearer_token)) | ||
| } else if let Some(access_key) = self.access_key { | ||
| Ok(credential::CredentialProvider::AccessKey(access_key)) | ||
| } else if self.use_managed_identity { |
There was a problem hiding this comment.
I'm not familiar with Azure SDKs, but the AWS and Google SDKs default to using managed identity if nothing else is specified. I'm not sure if this is something we want to do here?
There was a problem hiding this comment.
The default azure credential does a lot of opinionated things that I think we do not want to replicate here :). That said, it makes sense to use the metadata as default since it may work without any configuration.
|
I don't have the means to test this, so I am going to take it on faith that this works as advertised 😄 |
|
Benchmark runs are scheduled for baseline = f0be9da and contender = 73ce076. 73ce076 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
|
@tustvold - I also have to set up a real-world test and am in the process of doing this right now. Will open up a follow up PR if I see that changes need to be made. Also wanted to move that last variable :). Sorry for not mentioning that sooner! |
…3581) * feat: add azure managed identity credential * test: azure managed identity credential * feat: add azure federated token credential * test: add workload identity test * refactor: PR feedback * Update object_store/src/azure/mod.rs Co-authored-by: Raphael Taylor-Davies <[email protected]> * refactor: id priorities * refactor: use managed identity as default credential * chore: remove usused parameter Co-authored-by: Raphael Taylor-Davies <[email protected]>
Which issue does this PR close?
Closes apache/arrow-rs-object-store#198
Rationale for this change
see linked issue
What changes are included in this PR?
added implementations for a managed identity and federated workload credential for azure along with corresponding configuration on the azure builder.
Are there any user-facing changes?
users can now use managed identities and workload credentials to authenticate object store requests.