feat(services/azdls): Support parsing Azure Storage configs from connection strings#6212
feat(services/azdls): Support parsing Azure Storage configs from connection strings#6212Xuanwo merged 21 commits intoapache:mainfrom
Conversation
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
core/src/services/azdls/config.rs
Outdated
| /// config.client_secret = Some("myClientSecret".to_string()); | ||
| /// config.tenant_id = Some("myTenantId".to_string()); | ||
| /// ``` | ||
| pub fn try_from_connection_string(conn_str: &str) -> Result<Self> { |
There was a problem hiding this comment.
I think this would be very useful for the next steps of #5482, this is how I would like to see custom Configurator::from_uri
opendal/core/src/types/builder.rs
Line 152 in 7309b4b
There was a problem hiding this comment.
Thanks for the reference Jorge!
I'd like to rename the AzdlsConfig::try_from_connection_string() to AzdlsConfig::from_uri() to make it more generic, but I'm a bit cautious about treating the Azure connection string format as a URI.
Azure Storage connection strings are ;-separated key-value pairs. An example looks like this:
AccountName=example;DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net
When I parse such a string using http::uri::Uri, it interprets it as the host name:
let uri = conn_str.parse::<http::uri::Uri>().unwrap();
// Prints: host: AccountName=example;DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net
println!("host: {}", uri.host().unwrap());Technically, RFC#3986 (the RFC that defines the URI syntax) allows = and ; as part of the host name, but protocols like http further restrict allowed names via the DNS name syntax (e.g. RFC#1035). As I suspect the http crate to focus on URI parsing logic for HTTP protocol family, I'm not certain that this behavior won't change in the future.
In short: I'd rather refrain from treating Azure Storage connection strings as URIs (and parse them using that logic). So I think it makes sense to keep both APIs separate.
core/src/services/azdls/config.rs
Outdated
| fn parse_connection_string(conn_str: &str) -> Result<Vec<(&str, &str)>> { | ||
| conn_str | ||
| .split(';') | ||
| .map(|field| { |
There was a problem hiding this comment.
In #5482 we are going to introduce a query_pairs http util that does almost the same thing (but silently ignoring error instead of failing)
opendal/core/src/raw/http_util/uri.rs
Line 71 in 7309b4b
There was a problem hiding this comment.
Thanks, this is great! I'll change it (and add a separator: &str parameter to accommodate the ;) if your PR gets in first 👍
|
Thank you, @DerGut, for initiating this, and thank you, @jorgehermo9, for joining the review. The approach to how OpenDAL implements Azblob has a |
|
Oh wow, I don't know why it didn't occur to me to check 🙈 |
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
|
I've refactored the connection string parsing logic into a shared |
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
|
The test failures seem unrelated 🤔 |
Signed-off-by: DerGut <[email protected]>
Signed-off-by: DerGut <[email protected]>
Xuanwo
left a comment
There was a problem hiding this comment.
Most changes look good to me, perfect!
core/src/raw/mod.rs
Outdated
|
|
||
| #[cfg(any(feature = "services-azblob", feature = "services-azdls"))] | ||
| mod azure; | ||
| #[cfg(any(feature = "services-azblob", feature = "services-azdls"))] |
There was a problem hiding this comment.
Seems we need to add services-azfile here?
There was a problem hiding this comment.
You are right, I missed this! Added it with 818a2ce
Signed-off-by: DerGut <[email protected]>
Signed-off-by: DerGut <[email protected]>
Signed-off-by: DerGut <[email protected]>
Which issue does this PR close?
Closes #6211.
Rationale for this change
As mentioned in the issue description. Azure SDKs tend to allow users to pass a connection string. This PR parsing functionality to populate an
AzdlsConfigstruct with fields from a connection string.Additionally, I'd like to add this for the Iceberg ADLS support, as mentioned in the issue.
What changes are included in this PR?
The primary change is the user-facing one to support parsing Azure connection strings into
Azdlsstructs, see user-facing changes below.There are internal refactorings that come along with it too:
services-azblob,services-azdlsandservices-azfilenow share some abstractions inraw/azure.rsraw::azure_account_name_from_endpoint()raw::azure_config_from_connection_string()services-azdlsandservices-azfileare now included in thetestsfeature flag setAre there any user-facing changes?
Yes!
🟢 Additions
Azdls::from_connection_string(): users are now able to parse Azure connection strings to get anAzdlsBuilderAzblobConfigandAzdlsConfignow implementFrom<reqsign::AzureStorageConfig>Azblob::from_connection_string()now supports Azurite parametersUseDevelopmentStorageandDevelopmentStorageURIAzblob::from_connection_string()now always extracts the account name (it previously only used it when shared key authentication was used)Azfile::from_connection_string()while I was at it🟡 Changes
Azblob::from_connection_string()now errors whenDefaultEndpointsProtocolis invalid (i.e. nothttporhttps)Azblob::from_connection_string()now errors when fields are invalid (i.e. don't have the format ofkey=value)Azblob::from_connection_string()is more tolerant to multi-line connection strings (i.e. trims white space of individual key-value pairs too)🔴 Removals
None