-
Notifications
You must be signed in to change notification settings - Fork 101
Add parallel fetching for registered model ID #688
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomersmigrate/groupsCorresponds to Migrate Groups Step of go/uc/upgradeCorresponds to Migrate Groups Step of go/uc/upgrademigrate/mlgo/uc/upgrade Upgrade ML Assetsgo/uc/upgrade Upgrade ML Assetsstep/assessmentgo/uc/upgrade - Assessment Stepgo/uc/upgrade - Assessment Step
Description
At the moment, we're fetching registered model IDs sequentially in the MainThread:
crawl_permissions.log.2023-12-06_02-40:02:47:20 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/mlflow/registered-models/list?page_token=...
crawl_permissions.log.2023-12-06_02-40:02:47:20 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/mlflow/databricks/registered-models/get?name=...
crawl_permissions.log.2023-12-06_02-40:02:47:20 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/mlflow/databricks/registered-models/get?name=....
this results in overly long runtimes for assessment tasks, that go beyond 18 hours. I strongly believe this can be parallelised. Add Threads.parallel call to speed this up.
def models_listing(ws: WorkspaceClient):
def inner() -> Iterator[ml.ModelDatabricks]:
return Threads.parallel('fetching models with ID', map(lambda model: ws.model_registry.get_model(model.name).registered_model_databricks, ws.model_registry.list_models())
return inner
but it needs to be tested.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomersmigrate/groupsCorresponds to Migrate Groups Step of go/uc/upgradeCorresponds to Migrate Groups Step of go/uc/upgrademigrate/mlgo/uc/upgrade Upgrade ML Assetsgo/uc/upgrade Upgrade ML Assetsstep/assessmentgo/uc/upgrade - Assessment Stepgo/uc/upgrade - Assessment Step
Type
Projects
Status
No status