Skip to content

[FEATURE]: Support local dependencies #1360

@ericvergnaud

Description

@ericvergnaud

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

Our implementation of DependencyLoader is designed to work with workspaces, but there would be no workspace when a file is executed from a developer's local file system. We need to support that.

after a bit more torough analysis of this PR, i see that there has to be two (more) loader interfaces:

In fact, we need to have a NotebookLoader interface with WorkspaceNotebookLoader & LocalNotebookLoader implementations.

Then we'll have to get the FileLoader, which is sys.path aware. DependencyLoader has to take them both as arguments. e.g. load_dependency(self, dependency: Dependency, file_loader: FileLoader, notebook_loader: NotebookLoader), if we inject it via method. Remember: workspace file is still a file, if we're executing code from a Databricks Workflow (or notebook).

image

Remember: we have to support running this for both local FS (when running databricks labs ucx migrate-local-files) and remote notebooks (e.g. when running databricks labs ucx migrate-job-code --job-id ...).

Full illustrative example:
image

import pathlib

import b # imports site-packages/b.py
__import__('b') # imports site-packages/b.py

SYS_PATH: list[pathlib.Path] = [pathlib.Path(x) for x in sys.path]

import importlib

importlib.import_module()

# def __import__(module_to_import: str):
#     x = module_to_import.replace('.', '/')
#     candidates = [f'{x}.py', f'{x}/__init__.py']
#     for folder in SYS_PATH:
#         for candidate in candidates:
#             if not (folder / candidate).exists():
#                 continue
#             with open(folder / candidate) as f:
#                 return f.read()
#     raise ImportError(f'No module named {module_to_import}')

version_data = {}
version_file = pathlib.Path(__file__).parent / 'databricks/sdk/version.py'
with version_file.open('r') as f:
    exec(f.read(), version_data)

def import_module_from_absolute_path(x: pathlib.Path):
    with x.open() as f:
        return exec(f.read())

class FileLoader:
    def __init__(self, current_folder: pathlib.Path, sys_path: list[pathlib.Path]):
        self._current_folder = current_folder
        self._sys_path = sys_path

    def append_sys_path(self, path: pathlib.Path):
        self._sys_path.append(path)

    def resolve_module_source_file(self, module_to_import: str) -> OurModule:
        x = module_to_import.replace('.', '/')
        candidates = [f'{x}.py', f'{x}/__init__.py']
        for folder in self._sys_path:
            for candidate in candidates:
                candiate_file = (folder / candidate)
                if not candiate_file.exists():
                    continue
                return OurModule(
                    name=module_to_import,
                    path=candiate_file.relative_to(folder),
                    site=folder,
                )
        raise ImportError(f'No module named {module_to_import}')


@dataclass
class OurModule:
    name: str
    path: pathlib.Path
    site: pathlib.Path
    
    @property
    def absolute(self) -> pathlib.Path:
        return self.site / self.path

    def load(self):
        with open(self.absolute) as f:
            return f.read()
    
    

def do_stuff():
    current_notebook_name = '/Workspace/Users/[email protected]/Untitled.py'

    current_notebook_path = pathlib.Path(current_notebook_name)

    file_loader = FileLoader(current_notebook_path.parent, SYS_PATH)
    
    # sys.path.append('..')
    file_loader.append_sys_path(pathlib.Path('/Users/alex/Downloads'))

    for parsed_import in parse_imports(current_notebook_path.open().read()):
        module = file_loader.resolve_module_source_file(parsed_import)

        file_loader_of_module = FileLoader(module.absolute.parent, SYS_PATH)



from . import b # imports site-packages/y/b.py

Proposed Solution

Refactor DependencyLoader such that it can operate on local files, in the absence of a workspace.

Additional Context

Sub-ticket of #1202

Metadata

Metadata

Assignees

No one assigned

    Labels

    migrate/codeAbstract Syntax Trees and other dark magicmigrate/pythonPull requests that update Python code

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions