Skip to content

[Feature] Support Models in dbutils.fs operations#750

Merged
mgyucht merged 1 commit intodatabricks:mainfrom
shichengzhou-db:support_Models
Sep 12, 2024
Merged

[Feature] Support Models in dbutils.fs operations#750
mgyucht merged 1 commit intodatabricks:mainfrom
shichengzhou-db:support_Models

Conversation

@shichengzhou-db
Copy link
Copy Markdown
Contributor

@shichengzhou-db shichengzhou-db commented Sep 5, 2024

Changes

  • Support files operations in WorkspaceClient.Files for Databricks UC Model artifacts so that user can use databricks sdk to download UC model artifacts.
  • This PR is part of the work to migrate mlflow client towards using databricks sdk for model artifacts download/upload operations for better security.

Tests

  • Existing tests in test_dbfs_mixins.py, similar to how _VolumesPath is tested
  • The following code works
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
resp = w.files.download("/Models/system/ai/dbrx_instruct/3/MLmodel")
  • make test run locally
  • make fmt applied
  • relevant integration tests applied

@shichengzhou-db shichengzhou-db changed the title support Models in files operations [Internal] support Models in files operations Sep 5, 2024
Copy link
Copy Markdown
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these basically duplicate _VolumesPath and _VolumesIO. Let's reuse those instead of defining duplicate implementations. We can rename those to _FilesPath and _FilesIO to indicate that they can be used generally for resources backed by the Files API.

return f'<_VolumesPath {self._path}>'


class _ModelsPath(_Path):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new Path implementation? It seems like this just uses the Files API under the hood. From what I can tell, the only difference is open(), which returns ModelsIO now. That is in turn almost identical to _FilesIO.

If we changed https://github.com/databricks/databricks-sdk-py/pull/750/files#diff-469b96ba8e49ef7cbe8d3b4577d2a0ddf3a3552638b76d87bfccb936fda65d18R759 to check whether the path started with either /Volumes or /Models, I think the existing implementation would work. Is this correct? If so, we should just rename that to _FilesPath (as it uses the Files API) and reuse it.

Ultimately, that _path method should probably use some kind of prefix allowlist to determine whether the _FilesPath should be used, and we can add /Volumes and /Models as the initial prefixes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgyucht that sounds better. thanks for the suggestion, updated.

(I am not in the osp team but offered to do this work so we can speed up mlflow adoption of the databricks sdk, but I think since it's all Files API / Presigned URLs behind this sdk, we should be able to reuse the same path and io class for all securables)

@mgyucht mgyucht changed the title [Internal] support Models in files operations [Feature] Support Models in dbutils.fs operations Sep 9, 2024
@mgyucht mgyucht enabled auto-merge September 9, 2024 07:49
Copy link
Copy Markdown
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this contribution!

@shichengzhou-db
Copy link
Copy Markdown
Contributor Author

jenkins merge

@shichengzhou-db
Copy link
Copy Markdown
Contributor Author

hey @mgyucht how you know how to merge this pr? not seeing anything on my UI for merging

auto-merge was automatically disabled September 10, 2024 22:55

Head branch was pushed to by a user without write access

@shichengzhou-db shichengzhou-db force-pushed the support_Models branch 3 times, most recently from a26ecae to c7367a5 Compare September 11, 2024 22:56
@mgyucht mgyucht added this pull request to the merge queue Sep 12, 2024
Merged via the queue into databricks:main with commit 3162545 Sep 12, 2024
tanmay-db added a commit that referenced this pull request Sep 16, 2024
### New Features and Improvements

 * Support Models in `dbutils.fs` operations ([#750](#750)).

### Bug Fixes

 * Do not specify --tenant flag when fetching managed identity access token from the CLI ([#748](#748)).
 * Fix deserialization of 401/403 errors ([#758](#758)).
 * Use correct optional typing in `WorkspaceClient` for `mypy` ([#760](#760)).
github-merge-queue bot pushed a commit that referenced this pull request Sep 17, 2024
### New Features and Improvements

* Support Models in `dbutils.fs` operations
([#750](#750)).


### Bug Fixes

* Do not specify --tenant flag when fetching managed identity access
token from the CLI
([#748](#748)).
* Fix deserialization of 401/403 errors
([#758](#758)).
* Use correct optional typing in `WorkspaceClient` for `mypy`
([#760](#760)).
aravind-segu pushed a commit to aravind-segu/databricks-sdk-py that referenced this pull request Sep 18, 2024
### New Features and Improvements

* Support Models in `dbutils.fs` operations
([databricks#750](databricks#750)).


### Bug Fixes

* Do not specify --tenant flag when fetching managed identity access
token from the CLI
([databricks#748](databricks#748)).
* Fix deserialization of 401/403 errors
([databricks#758](databricks#758)).
* Use correct optional typing in `WorkspaceClient` for `mypy`
([databricks#760](databricks#760)).
github-merge-queue bot pushed a commit that referenced this pull request Oct 7, 2024
)

This reverts commit 3162545.
Verified that /Models download still work correctly.

## Changes
<!-- Summary of your changes that are easy to understand -->

## Tests
<!-- 
How is this tested? Please see the checklist below and also describe any
other relevant tests
-->

- [ ] `make test` run locally
- [ ] `make fmt` applied
- [ ] relevant integration tests applied
parthban-db added a commit that referenced this pull request Oct 7, 2024
### Bug Fixes

 * Fix Model Serving Test ([#781](#781)).
 * Include package name for external types when deserializing responses ([#786](#786)).

### Internal Changes

 * Refactor ApiClient into `_BaseClient` and `ApiClient` ([#785](#785)).
 * Update to latest OpenAPI spec ([#787](#787)).
 * revert Support Models in `dbutils.fs` operations ([#750](#750)) ([#778](#778)).

### API Changes:

 * Added [w.disable_legacy_dbfs](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/disable_legacy_dbfs.html) workspace-level service.
 * Added `default_source_code_path` and `resources` fields for `databricks.sdk.service.apps.App`.
 * Added `resources` field for `databricks.sdk.service.apps.CreateAppRequest`.
 * Added `resources` field for `databricks.sdk.service.apps.UpdateAppRequest`.

OpenAPI SHA: bc17b474818138f19b78a7bea0675707dead2b87, Date: 2024-10-07
github-merge-queue bot pushed a commit that referenced this pull request Oct 7, 2024
### Bug Fixes

* Fix Model Serving Test
([#781](#781)).
* Include package name for external types when deserializing responses
([#786](#786)).


### Internal Changes

* Refactor ApiClient into `_BaseClient` and `ApiClient`
([#785](#785)).
* Update to latest OpenAPI spec
([#787](#787)).
* revert Support Models in `dbutils.fs` operations
([#750](#750))
([#778](#778)).


### API Changes:

* Added
[w.disable_legacy_dbfs](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/settings/disable_legacy_dbfs.html)
workspace-level service.
* Added `default_source_code_path` and `resources` fields for
`databricks.sdk.service.apps.App`.
* Added `resources` field for
`databricks.sdk.service.apps.CreateAppRequest`.
* Added `resources` field for
`databricks.sdk.service.apps.UpdateAppRequest`.

OpenAPI SHA: bc17b474818138f19b78a7bea0675707dead2b87, Date: 2024-10-07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants