Skip to content

pip reininstall is executed in every single task #8

@jmeidam

Description

@jmeidam

I am trying to convert a current dbx project to bundles.
I have some tasks of type python_wheel_task.

One such tasks looks like this (they're all similar):

        - task_key: "data_raw"
          depends_on:
            - task_key: "process_init"
          job_cluster_key: "somejobcluster"
          python_wheel_task:
            package_name: "myproject"
            entry_point: "data_raw"
          libraries:
            - whl: ./dist/myproject-*.whl

and I have defined the following artifact:

    artifacts:
      the_wheel:
        type: whl
        path: .
        build: poetry build

In dbx, the wheel would be installed once on the job-cluster.
Now I noticed that every task is converted to a notebook that contains the following code:

%pip install --force-reinstall /Workspace/Shared/dbx/projects/myproject/.internal/.../myproject-0.0.0-py3-none-any.whl

This seems rather wasteful of running time if you have many tasks that do small things on the same cluster.

Am I missing a setting, or is this done by design?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions