Skip to content

Comments

feat(services/azdls): Support parsing Azure Storage configs from connection strings#6212

Merged
Xuanwo merged 21 commits intoapache:mainfrom
DerGut:main
May 29, 2025
Merged

feat(services/azdls): Support parsing Azure Storage configs from connection strings#6212
Xuanwo merged 21 commits intoapache:mainfrom
DerGut:main

Conversation

@DerGut
Copy link
Contributor

@DerGut DerGut commented May 21, 2025

Which issue does this PR close?

Closes #6211.

Rationale for this change

As mentioned in the issue description. Azure SDKs tend to allow users to pass a connection string. This PR parsing functionality to populate an AzdlsConfig struct with fields from a connection string.

Additionally, I'd like to add this for the Iceberg ADLS support, as mentioned in the issue.

What changes are included in this PR?

The primary change is the user-facing one to support parsing Azure connection strings into Azdls structs, see user-facing changes below.

There are internal refactorings that come along with it too:

  • services-azblob, services-azdls and services-azfile now share some abstractions in raw/azure.rs
    • a shared raw::azure_account_name_from_endpoint()
    • a shared raw::azure_config_from_connection_string()
  • services-azdls and services-azfile are now included in the tests feature flag set

Are there any user-facing changes?

Yes!

🟢 Additions

  1. Azdls::from_connection_string(): users are now able to parse Azure connection strings to get an AzdlsBuilder
  2. Both AzblobConfig and AzdlsConfig now implement From<reqsign::AzureStorageConfig>
  3. Azblob::from_connection_string() now supports Azurite parameters UseDevelopmentStorage and DevelopmentStorageURI
  4. Azblob::from_connection_string() now always extracts the account name (it previously only used it when shared key authentication was used)
  5. Also added Azfile::from_connection_string() while I was at it

🟡 Changes

  • Azblob::from_connection_string() now errors when DefaultEndpointsProtocol is invalid (i.e. not http or https)
  • Azblob::from_connection_string() now errors when fields are invalid (i.e. don't have the format of key=value)
  • Azblob::from_connection_string() is more tolerant to multi-line connection strings (i.e. trims white space of individual key-value pairs too)

🔴 Removals

None

DerGut added 3 commits May 21, 2025 22:21
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
@DerGut DerGut requested a review from Xuanwo as a code owner May 21, 2025 21:00
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. releases-note/feat The PR implements a new feature or has a title that begins with "feat" labels May 21, 2025
@DerGut DerGut changed the title Support parsing AzdlConfig from Azure Storage connection strings feat(services/azdls) Support parsing AzdlConfig from Azure Storage connection strings May 21, 2025
Signed-off-by: Jannik Steinmann <[email protected]>
/// config.client_secret = Some("myClientSecret".to_string());
/// config.tenant_id = Some("myTenantId".to_string());
/// ```
pub fn try_from_connection_string(conn_str: &str) -> Result<Self> {
Copy link
Contributor

@jorgehermo9 jorgehermo9 May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be very useful for the next steps of #5482, this is how I would like to see custom Configurator::from_uri

fn from_uri(uri: &Uri, options: impl IntoIterator<Item = (String, String)>) -> Result<Self> {
@Xuanwo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reference Jorge!

I'd like to rename the AzdlsConfig::try_from_connection_string() to AzdlsConfig::from_uri() to make it more generic, but I'm a bit cautious about treating the Azure connection string format as a URI.

Azure Storage connection strings are ;-separated key-value pairs. An example looks like this:

AccountName=example;DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net

When I parse such a string using http::uri::Uri, it interprets it as the host name:

let uri = conn_str.parse::<http::uri::Uri>().unwrap();

// Prints: host: AccountName=example;DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net
println!("host: {}", uri.host().unwrap());

Technically, RFC#3986 (the RFC that defines the URI syntax) allows = and ; as part of the host name, but protocols like http further restrict allowed names via the DNS name syntax (e.g. RFC#1035). As I suspect the http crate to focus on URI parsing logic for HTTP protocol family, I'm not certain that this behavior won't change in the future.

In short: I'd rather refrain from treating Azure Storage connection strings as URIs (and parse them using that logic). So I think it makes sense to keep both APIs separate.

fn parse_connection_string(conn_str: &str) -> Result<Vec<(&str, &str)>> {
conn_str
.split(';')
.map(|field| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #5482 we are going to introduce a query_pairs http util that does almost the same thing (but silently ignoring error instead of failing)

pub fn query_pairs(query: &str) -> Vec<(String, String)> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great! I'll change it (and add a separator: &str parameter to accommodate the ;) if your PR gets in first 👍

@Xuanwo
Copy link
Member

Xuanwo commented May 23, 2025

Thank you, @DerGut, for initiating this, and thank you, @jorgehermo9, for joining the review. The approach to how OpenDAL implements from_uri is still unclear at this point. I suggest we exclude the design of from_uri from our current considerations.

Azblob has a from_connection_string API already. Perhaps we can extract this functionality to opendal::raw::azure_utils so that we can share the same implementation across all Azure-related services.

@DerGut
Copy link
Contributor Author

DerGut commented May 23, 2025

Oh wow, I don't know why it didn't occur to me to check 🙈
I'll take a stab at it later today. Thanks for bringing it up!

@DerGut
Copy link
Contributor Author

DerGut commented May 26, 2025

I've refactored the connection string parsing logic into a shared raw/azure.rs. This is much better! I also noticed that we may even be able to push some of this down to reqsign in a later PR.

Signed-off-by: Jannik Steinmann <[email protected]>
@DerGut DerGut changed the title feat(services/azdls) Support parsing AzdlConfig from Azure Storage connection strings feat(services/azdls) Support parsing Azure Storage configs from connection strings May 26, 2025
DerGut added 2 commits May 26, 2025 18:50
Signed-off-by: Jannik Steinmann <[email protected]>
Signed-off-by: Jannik Steinmann <[email protected]>
@DerGut
Copy link
Contributor Author

DerGut commented May 26, 2025

The test failures seem unrelated 🤔

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most changes look good to me, perfect!


#[cfg(any(feature = "services-azblob", feature = "services-azdls"))]
mod azure;
#[cfg(any(feature = "services-azblob", feature = "services-azdls"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we need to add services-azfile here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I missed this! Added it with 818a2ce

@Xuanwo Xuanwo changed the title feat(services/azdls) Support parsing Azure Storage configs from connection strings feat(services/azdls): Support parsing Azure Storage configs from connection strings May 29, 2025
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @DerGut a lot for building this!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 29, 2025
@Xuanwo Xuanwo merged commit 60f4a0d into apache:main May 29, 2025
275 of 277 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer releases-note/feat The PR implements a new feature or has a title that begins with "feat" size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

new feature: Parsing AzdlsConfig from Azure Storage connection strings

3 participants