-
Notifications
You must be signed in to change notification settings - Fork 101
[FEATURE]: Support local dependencies #1360
Description
Is there an existing issue for this?
- I have searched the existing issues
Problem statement
Our implementation of DependencyLoader is designed to work with workspaces, but there would be no workspace when a file is executed from a developer's local file system. We need to support that.
after a bit more torough analysis of this PR, i see that there has to be two (more) loader interfaces:
In fact, we need to have a NotebookLoader interface with WorkspaceNotebookLoader & LocalNotebookLoader implementations.
Then we'll have to get the FileLoader, which is sys.path aware. DependencyLoader has to take them both as arguments. e.g. load_dependency(self, dependency: Dependency, file_loader: FileLoader, notebook_loader: NotebookLoader), if we inject it via method. Remember: workspace file is still a file, if we're executing code from a Databricks Workflow (or notebook).
Remember: we have to support running this for both local FS (when running databricks labs ucx migrate-local-files) and remote notebooks (e.g. when running databricks labs ucx migrate-job-code --job-id ...).
import pathlib
import b # imports site-packages/b.py
__import__('b') # imports site-packages/b.py
SYS_PATH: list[pathlib.Path] = [pathlib.Path(x) for x in sys.path]
import importlib
importlib.import_module()
# def __import__(module_to_import: str):
# x = module_to_import.replace('.', '/')
# candidates = [f'{x}.py', f'{x}/__init__.py']
# for folder in SYS_PATH:
# for candidate in candidates:
# if not (folder / candidate).exists():
# continue
# with open(folder / candidate) as f:
# return f.read()
# raise ImportError(f'No module named {module_to_import}')
version_data = {}
version_file = pathlib.Path(__file__).parent / 'databricks/sdk/version.py'
with version_file.open('r') as f:
exec(f.read(), version_data)
def import_module_from_absolute_path(x: pathlib.Path):
with x.open() as f:
return exec(f.read())
class FileLoader:
def __init__(self, current_folder: pathlib.Path, sys_path: list[pathlib.Path]):
self._current_folder = current_folder
self._sys_path = sys_path
def append_sys_path(self, path: pathlib.Path):
self._sys_path.append(path)
def resolve_module_source_file(self, module_to_import: str) -> OurModule:
x = module_to_import.replace('.', '/')
candidates = [f'{x}.py', f'{x}/__init__.py']
for folder in self._sys_path:
for candidate in candidates:
candiate_file = (folder / candidate)
if not candiate_file.exists():
continue
return OurModule(
name=module_to_import,
path=candiate_file.relative_to(folder),
site=folder,
)
raise ImportError(f'No module named {module_to_import}')
@dataclass
class OurModule:
name: str
path: pathlib.Path
site: pathlib.Path
@property
def absolute(self) -> pathlib.Path:
return self.site / self.path
def load(self):
with open(self.absolute) as f:
return f.read()
def do_stuff():
current_notebook_name = '/Workspace/Users/[email protected]/Untitled.py'
current_notebook_path = pathlib.Path(current_notebook_name)
file_loader = FileLoader(current_notebook_path.parent, SYS_PATH)
# sys.path.append('..')
file_loader.append_sys_path(pathlib.Path('/Users/alex/Downloads'))
for parsed_import in parse_imports(current_notebook_path.open().read()):
module = file_loader.resolve_module_source_file(parsed_import)
file_loader_of_module = FileLoader(module.absolute.parent, SYS_PATH)
from . import b # imports site-packages/y/b.py
Proposed Solution
Refactor DependencyLoader such that it can operate on local files, in the absence of a workspace.
Additional Context
Sub-ticket of #1202
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
