Skip to content

Add parallel fetching for registered model ID #688

@nfx

Description

@nfx

At the moment, we're fetching registered model IDs sequentially in the MainThread:

crawl_permissions.log.2023-12-06_02-40:02:47:20 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/mlflow/registered-models/list?page_token=...
crawl_permissions.log.2023-12-06_02-40:02:47:20 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/mlflow/databricks/registered-models/get?name=...
crawl_permissions.log.2023-12-06_02-40:02:47:20 DEBUG [databricks.sdk] {MainThread} GET /api/2.0/mlflow/databricks/registered-models/get?name=....

this results in overly long runtimes for assessment tasks, that go beyond 18 hours. I strongly believe this can be parallelised. Add Threads.parallel call to speed this up.

potential fix in https://github.com/databrickslabs/ucx/blob/main/src/databricks/labs/ucx/workspace_access/generic.py#L327-L333:

def models_listing(ws: WorkspaceClient):
    def inner() -> Iterator[ml.ModelDatabricks]:
        return Threads.parallel('fetching models with ID', map(lambda model: ws.model_registry.get_model(model.name).registered_model_databricks, ws.model_registry.list_models())
    return inner

but it needs to be tested.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestgood first issueGood for newcomersmigrate/groupsCorresponds to Migrate Groups Step of go/uc/upgrademigrate/mlgo/uc/upgrade Upgrade ML Assetsstep/assessmentgo/uc/upgrade - Assessment Step

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions