Skip to content

Monorepo support using groups and path dependencies #6850

@adriangb

Description

@adriangb

Following up on https://discord.com/channels/487711540787675139/1032534656777715732/1032549486230253580

Summary: I think we can make some relatively simple and universally (as in not only beneficial to monorepos) changes to the existing path dependencies that will make them much easier to use for monorepos. There are also some more complex feature requests to handle but I think those can be tabled for future feature work.

I had to migrate a Python monorepo at work from an ad-hoc structure to something with a more explicit separation of applications and components. I explored several options including monorepo build tools (Bazel, Pants, etc.) and Python package managers (Hatch, PDM and Poetry). I won't get into the details but the TLDR is that the build tools are too complex for a simple Python-only monorepo and the other package managers have missing features. There are several ways to structure monorepos with Poetry, but after surveying a lot of blog posts, issues and plugins I came across this one: https://github.com/martinxsliu/poetry_workspace_plugin (@martinxsliu if you're out there thank you and feel free to chime in). While I don't plan on using that Plugin necessarily (I don't need most of the functionality it provides) it had some interesting ideas. The main thing I changed was using Poetry's existing group feature to organize the path dependencies to projects so that even without a plugin you can get basic functionality.

The good

So the final pattern I landed on is this: https://github.com/adriangb/python-monorepo/tree/main/poetry

TLDR is:

# /pyproject.toml
[tool.poetry.group.app.dependencies]
namespace-app = { path = "workspaces/app", develop = true}

[tool.poetry.group.lib.dependencies]
namespace-lib = { path = "workspaces/lib", develop = true}

[tool.poetry.group.lib-test.dependencies]
namespace-lib = { path = "workspaces/lib", develop = true, extras=["test"]}

# /{workspace,projects}/app/pyproject.toml
[tool.poetry.dependencies]
namespace-lib = { path = "../lib"}

So you can then do:

poetry install --only app

Which will install app, lib and any 3rd party dependencies!

The bad

poetry lock --no-update doesn't work

This is alluded to in the plugin linked to above: https://github.com/martinxsliu/poetry_workspace_plugin/blob/master/poetry_workspace/plugin.py#L83-L107

Even worse: if you delete any path dependencies (delete the folder, delete it from pyproject.toml) poetry lock --no-update straight up crashes because DirectoryDependency is used to load the lock file (which still has a reference to the deleted dependency) and unlike VCS and other dependencies it validates that the path exists in __init__. This means your options are to (a) run poetry lock and update all 3rd party dependencies (b) delete it from pyproject.toml first, run poetry --lock-update, delete the folder and run poetry lock --no-update again) or (c) edit poetry.lock before running poetry lock --no-update. Especially with --no-update becoming the default at some point, this is problematic.

I've submitted a couple of PRs to fix this:

No caching in Docker builds because source is required to install 3rd party dependencies

When you're building a Dockerfile that uses Poetry it's best to use --no-root to install 3rd party dependencies first (which only requires a pyproject.toml and a poetry.lock lockfile) so that the layer which installs 3rd party dependencies gets cached across builds (it's often both the slowest and least frequently changed step). This breaks down if you have any path dependencies because you'll need to copy them over in order to install 3rd party dependencies (even using groups there's no way to skip the path dependencies but install its transitive 3rd party dependencies).

I opened #6845 as a way to address this. I'm not sure if --no-path is the best way; I'm open to other options, but the basic idea is there.

No way to build sdists/wheels for projects with path dependencies

In my example repo you can't run poetry build on cli because it depends on lib. One might want to do this if e.g. you have a backend and a client you are publishing on PyPi that share a common component/library. There are several open issues around this, which I won't get into detail on. My thoughts is that it would be nice to have a feature like namespace-lib = { path = "../lib", wheel = { version = "~={}.{}.{}"} } where the {} in version means "insert the current version" or namespace-lib = { path = "../lib", wheel = { target = "path/inside/wheel" } }. I think this quite complex of a topic that needs more thought. Even Cargo hasn't fully figured this out since they don't support inheriting the version from workspace dependencies (their overrides feature may be of interest but is really beyond the scope of this issue). I think this can be solved but I would table it for now since it needs more thought and will likely be more invasive to the codebase.

#1168 also is related to this particular use case and might be an alternative path forward.

No support for dependency groups in path dependencies

It would be interesting to have some special handling for Poetry path dependencies so that we can "propagate" groups from path dependencies to workspace/root projects (for example, choosing to install the test group in a path dependency). I think this can be done but I don't think it should be a blocker for other stuff, also going to table this for now.

The ugly

Boilerplate

There's a lot boilerplate involved in the whole groups and path deps thing, especially if you get into listing extras. I think this is where a plugin might shine: it could completely replace these sections with a [tool.workspaces] section or just serve to keep them in sync with the filesystem (to avoid manually writing the boilerplate).

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureFeature requests/implementationsstatus/triageThis issue needs to be triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions