Skip to content

Add Support for Azure Data Lake Storage for cloud storage #47

@zcking

Description

@zcking

Is your feature request related to a problem? Please describe.

The server.properties and TemporaryTableCredentialsService only support local files and S3 for storage currently. It would be nice to bring support for ADLS Gen2 (and Google Cloud Storage probably) to accommodate the majority of cloud users and have parity with the Databricks Unity Catalog (pre-open-source).

Describe the solution you would like

In server.properties I think we should avoid assumptions or ambiguity by not using the s3. prefix. Ideally I would like to see the structure mirror Databricks' External Locations more closely. Something like this is an idea:

server.env=dev

# External Locations:
# 
# S3:
externalLocation.bucketPath.0=s3://unitycatalog-220412-us-east-1/
externalLocation.accessKey.0=XXXX
externalLocation.secretKey.0=XXXX
externalLocation.sessionToken.0=XXXX

# Local File System:
externalLocation.bucketPath.1=file:///data/unity-catalog/
externalLocation.accessKey.1=
externalLocation.secretKey.1=
externalLocation.sessionToken.1=

# ADLS Gen2
externalLocation.bucketPath.2=abfs://data@unitycatalog220412.dfs.core.windows.net/catalog
externalLocation.accessKey.2=XXXX
externalLocation.secretKey.2=XXXX
externalLocation.sessionToken.2=XXXX

# GCS
externalLocation.bucketPath.3=gs://unitycatalog-220412/
externalLocation.accessKey.3=XXXX
externalLocation.secretKey.3=XXXX
externalLocation.sessionToken.3=XXXX

Additional context

Slightly related issue to this feature request is support for the endpoint URL. That would help support more object storages (e.g. MinIO or others that are compatible with the common S3 API): #43

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions