feat: support for external executor plugins#2305
feat: support for external executor plugins#2305johanneskoester merged 52 commits intosnakemake:mainfrom
Conversation
johanneskoester
left a comment
There was a problem hiding this comment.
@vsoch this is so great, thanks a lot! And a very nice and simple approach.
Topic 1:
Regarding the args, I have the following thought:
Since Snakemake is not only a CLI app but also is from time to time used as a python library (people calling the snakemake function directly), I think it is probably better to not configure executor plugins via args. Instead I think it makes sense to use a dataclass for the executor plugin settings. Each plugin could then provide its own dataclass, by convention named e.g. ExecutorParameters. An object of this dataclass can then be expected as argument in the snakemake function. Further, the dataclass itself can be used to extend the argument parser of Snakemake, e.g. using this package which allows to annotate data classes with help texts for the argument parser.
Maybe there is also a better way, but this would be my first idea.
Topic 2:
In addition, I think it makes sense to decide about the namespacing of executor arguments while designing this. I think there are two types of arguments relevant for an executor.
- Those that are executor specific (like
--k8s-cpu-scalar). - Those that are relevant for multiple executors (like --preemptible).
For the executor specific ones, we should define a prefix that comes from the plugin name (e.g. just the name of the executor). For the others, the name is kind of free to choose. In order to let the executor access all those general args, it makes probably sense to put everything that is currently passed as individual parameters to the snakemake function into a dataclass as well. Then, the executor plugin can just get two dataclass objects (the global args and the executor specific args).
Topic 3:
I think the plugin implementation should require a plugin to provide the minimum compatible Snakemake version. A natural value for the beginning would be the first release with plugin functionality. But later, some plugins might require more recent versions of Snakemake and the user should get a proper error message instead of just some cryptic exceptions.
Topic 4:
And finally: yes, every executor should eventually be migrated into a plugin.
f75dced to
fb52306
Compare
|
@johanneskoester an update! I completely agree with your sentiment - I was not happy with the args design either, and I think using a dataclass is a great strategy. Specifically what I decided to try for this first go is:
Note that the above approach doesn't require an extra dependency in snakemake, but just in the plugin, which I like. I also provide the "parse your args back into the dataclass" as a
Oh that's a fantastic idea - I could follow up with this PR, unless you think it's absolutely essential here (but would be a lot of moving pieces for one PR). And I'm not sure where those other commits came from - I hope I didn't bork the git history. I can try a rebase, although now I'm nervous to screw the entire thing up. Perhaps when it is definitely done I will just redo the whole thing over freshly.
I didn't add this yet, but I will after posting this!
Agree! This init file and passing forward arguments has seriously been giving me nightmares for years... I'm glad we have a plan to clean things up a bit! For details on the current state of the plugin setup, we have an early writeup here: https://github.com/snakemake/snakemake-executor-flux#how-does-it-work. I'll update it soon for the snakemake version (and will validate it's defined). |
|
I think the other commits are just coming from merging in upstream. That should be fine. In order to get the plugin interface right, it seems to me that the dataclass approach for the other arguments should be done first. But indeed that can happen in a separate PR. It would be awesome if you really would like to do that! |
Most of the functionality within those functions can be provided by snakemake, but I expose them on the level of the plugin to give the plugins control to dictate logic. But actually, we could remove them from validation and then do a check (and call the default provided in snakemake) if they aren't present! What did you have in mind for porting current executors to plugins? Having the same structure but adding them to snakemake dependencies so they are always installed? |
4894fec to
df5f7a4
Compare
|
Just rebased - I'm not sure why the local test is reporting there is no module packaging, because it's definitely in test-environment.yaml! |
|
@johanneskoester it is reporting that packaging isn't installed, however I've added it to the snakemake install requires, the test step, and the container. I'm not familiar with this setup - can you take a look and advise where it needs to be added? And I'd comment if this is going to be a pain point, we should fall back to being more lenient to explicit version checking. Maybe there is a simpler way, to try importing what we need for the specific plugin and then reporting something is missing/wrong. |
Yes, exactly that. |
johanneskoester
left a comment
There was a problem hiding this comment.
Since Snakemake pickles parameters into scripts executed within singularity, any imports of external dependencies have to be done at the function/method level, not the module level. Otherwise, pickling will require them to be present in the container.
snakemake/plugins.py
Outdated
| import importlib | ||
| import os | ||
| import pkgutil | ||
| import packaging.version |
There was a problem hiding this comment.
| import packaging.version |
Actually, I would prefer if plugins would not access the argparser themselves at all. Otherwise, it will become very hard to change it in any way. Further, since Snakemake can be used as an API, there will be cases where one wants to use a plugin without even having an argparser (you just directly call the |
|
@johanneskoester how would we get custom arguments then? |
|
The executor provides information about its required input via the dataclass that it defines. Like this: For the two use cases:
I was wondering whether we should have a way to specify and access non executor specific arguments, but I think this is not necessary. At least not beyond those parameters that are already passed to the current exectuors by the scheduler. This also means that we don't have to modify the entire Snakemake towards dataclass usage for now. We can simply start with the executor plugins. |
So you are OK with the plugin dynamically adding arguments (meaning snakemake adding them based on the dataclass, the opposite direction that we have now), you just don't want to explicitly hand over args? What about cases where we need further customization of the args for the plugin to work? That could work, although if more than one plugin is registered, we can't enforce any of the arguments to be required, and if we can't pass the args forward to validate, it will be more of a "try to run and hope it works." Let me know if that is what you have in mind and I can make changes. |
|
Taking a look again - the case for just having the arguments from the dataclass added, as is, is already easy - we can just tweak this function: To add the plugin prefix to the ExecutorParameters before we provide to that function. But what about needing to tweak logic within snakemake apriori? We have a function for that too: But I'm thinking about cases in the current init where we have big if statements like "if this executor, change the filesystem setting to X." I'm guessing your thinking is that we will eventually have some kind of shared executor parameters that somehow handle that? Should we not consider that for now, and just move these existing functions to (instead of being called by the plugin) to be called by snakemake directly with the context of the plugin? |
I would handle that by requiring that the plugin has a dict called This queries should happen inside of the |
a548130 to
f28f050
Compare
|
@johanneskoester ignoring the black failure for now (I somehow have the same version 23.3.0 installed but it is saying there are no changes so we can look at that detail later) I've refactored the plugin structure so that the plugin itself does no additional parsing of either the parser, or the args. The plugin now can define whatever fields they like, eg.., And then snakemake iterates through each plugin's fields, and adds them with the plugin prefix. E.g., the above becomes: And then to go the other way around, we take that dataclass, remove the prefix, and parse back into the original. An edge case I'm thinking abiout More detail updated here: https://github.com/snakemake/snakemake-executor-flux. I think this is probably closer to what you had in mind - let me know what we should work on next (and what might be scoped to this PR vs done in a follow up). |
There was a problem hiding this comment.
I have thought further about this (all morning basically).
Instead of the minimum version requirerement, we can do something else: The execuctor plugin and Snakemake itself can depend on a third package, called snakemake-executor-plugin-interface. This package implements the logic for interacting with the plugin (so mainly what you now have in snakemake/plugins.py). This way, we achieve the following:
The versioning of Snakemake is decoupled from the plugin interface. This allows the plugin to e.g. depend on the major version of the interface package, and thereby easily avoid future breakages. Further, we don't need custom attributes and functions for checking compatibility. The package dependency management solves the problem for us. Also, a user choosing a specific plugin that has not been updated for a while will automatically get the latest compatible Snakemake version suggested by the package manager.
I have prepared a skeleton for this here: https://github.com/snakemake/snakemake-executor-plugin-interface
While doing that, I have also redesigned the interface another time. It is now more object oriented, which has the advantage to assert compliance with the interface via the types.
The basic implementation is already in place, and I was wondering if you would like to migrate your plugin code from here into the missing pieces of the new interface via a PR on that repository. Mainly, we now need
- The code that translates the dataclass into CLI args here
- The code that translates args into the dataclass here
- The code that collects all plugins here
I have added an example how the executor plugin would implement this to the README.
What do you think?
snakemake/plugins.py
Outdated
|
|
||
| # Required executor plugin attributes | ||
| executor_plugin_attributes = [ | ||
| "local_executor", |
There was a problem hiding this comment.
I don't think this has to be decided in the executor plugin. The local executor should always be the CPUExecutor shouldn't it?
There was a problem hiding this comment.
I noticed there is logic in scheduler.py to have a different local executor, but I think we could fairly safely assume the CPU executor for now, and think about how to handle it if that isn't the case.
If that's the direction you like, it works for me! One suggestion - should we have a different prefix name for this library so it doesn't detect itself as a plugin? And also - if you have vision for other components of snakemake to be plugins, would you want to manage another interface (or go off of this one?) |
This is a demo of how snakemake could allow external executor plugins. The basic design uses pkgutil to discover snakemake_executor_* plugins, and then provides them to the client (to add arguments) and scheduler to select using one with --executor. Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
Signed-off-by: vsoch <[email protected]>
|
Kudos, SonarCloud Quality Gate passed!
|
|
Woohoo!! What would you like to do next @johanneskoester ? I can start refactoring some the current executors into plugins, or since we already have one for flux, remove it here. Let me know your preferences! |
🤖 I have created a release *beep* *boop* --- ## [8.0.0](v7.32.2...v8.0.0) (2023-12-20) ### ⚠ BREAKING CHANGES Snakemake 8 marks the beginning of decomposing Snakemake into a framework of plugins. This enables the democratization of method development within the Snakemake ecosystem. We start with plugins for storage and execution backends. In the future, there will be plugins for the scheduling, metadata, software deployment, reporting, and many more. This way, it will be possible to easily launch and explore new developments in workflow management and reproducible data analysis without the need to get your work merged into the main codebase of Snakemake and also without the need to develop a new workflow management system as a proof of concept. In detail, Snakemake 8 introduces the following changes. Unfortunately it was unavoidable to break some usages (we apologize). Nevertheless, we tried to ensure that every removed or modified feature has been replaced with an equivalent reimplementation, as outlined in our [migration docs](https://snakemake.readthedocs.io/en/latest/getting_started/migration.html#migrating-to-snakemake-8). While Snakemake 8 has an even more thorough testing framework than any release before, and while it has been quite heavily tested in practice by us, you might initially experience bugs and glitches for which we want to apologize beforehand. We think that the massive codebase improvements are worth it in the long run, and hope that everything goes well. As always, any pull requests with test cases and pointers to bugs are more than welcome. #### Detailed breaking changes * removed the long time ago deprecated support for dynamic, version, and subworkflow (see [the migration docs](https://snakemake.readthedocs.io/en/latest/getting_started/migration.html#migrating-to-snakemake-8)) * migrated old remote providers into storage plugins (see [the migration docs](https://snakemake.readthedocs.io/en/latest/getting_started/migration.html#migrating-to-snakemake-8)) * migrated execution backends into plugins, including a change in the respective command line interfaces (see [the migration docs](https://snakemake.readthedocs.io/en/latest/getting_started/migration.html#migrating-to-snakemake-8)) * deprecates `--use-conda` and `--use-singularity` in favor of `--software-deployment-method conda` or `--software-deployment-method apptainer` and `--software-deployment-method conda apptainer` (see [the migration docs](https://snakemake.readthedocs.io/en/latest/getting_started/migration.html#migrating-to-snakemake-8)) * profile support is now versioned, such that different profiles can be written for different minimum Snakemake versions (see [the migration docs](https://snakemake.readthedocs.io/en/latest/getting_started/migration.html#migrating-to-snakemake-8)) * redesigned Snakemake API. It now uses a modern, dataclass based approach (see [the migration docs](https://snakemake.readthedocs.io/en/latest/getting_started/migration.html#migrating-to-snakemake-8)) ### Features * add ability to inject conda environments into running Snakefile ([#2479](#2479)) ([6140e29](6140e29)) * add functionality for deploying sources if no shared FS is assumed ([#2486](#2486)) ([76eac3c](76eac3c)) * add option to control software deployment mode (shared or non shared FS) ([#2525](#2525)) ([04ec2c0](04ec2c0)) * allow detailed configuration of shared FS usage ([#2528](#2528)) ([0d34be9](0d34be9)) * allow environment variables in string values of profile (e.g. paths may now contain elements like $USER). ([58dc70c](58dc70c)) * allow python expressions in --set-resources ([#2521](#2521)) ([022a31e](022a31e)) * allow to set latency_wait in executor test suite ([c0bca0b](c0bca0b)) * automatically upload workflow sources to default storage provider if no shared FS is used ([a450c49](a450c49)) * Faster ci test setup ([#2489](#2489)) ([4798e8a](4798e8a)) * implement precommand ([#2482](#2482)) ([ff0f979](ff0f979)) * redesigned Snakemake API. It now uses a modern, dataclass based approach ([#2403](#2403)) ([2be3bfa](2be3bfa)) * support for external executor plugins ([#2305](#2305)) ([c9eaa4e](c9eaa4e)) * version specific profile config files (profile/config.v8+.yaml with profile/config.yaml as fallback that matches any version) ([#2498](#2498)) ([47e5811](47e5811)) ### Bug Fixes * adapt to changes in snakemake-interface-executor-plugins ([635c68a](635c68a)) * add storage provider args to deploy sources command ([67178e3](67178e3)) * add testcase for script directive to work with Python 3.7 and corresponding fix. ([0b4ae2e](0b4ae2e)) * allow pepfile and pepschema to take pathlib ([#2546](#2546)) ([ca91661](ca91661)) * also inherit rule proxies if there is no rulename modifier specified in a use rule statement ([#2440](#2440)) ([1570289](1570289)) * assume at most 8GB memory for default resources. This way, we avoid exploding memory requirements for large input files that are very unlikely to be put entirely into memory by any tool. ([11c2ecc](11c2ecc)) * comparison to float in scheduler ([ef44d84](ef44d84)) * detect job paths that leave and then enter a group. Such paths are invalid because then the group depends on itself. ([#2527](#2527)) ([5383a4d](5383a4d)) * ensure that auto deployment of default storage provider works in containers with read-only root home. ([1a347ff](1a347ff)) * ensure that log and benchmark files are uploaded to storage as well ([#2545](#2545)) ([6aabb5d](6aabb5d)) * ensure that targetjob is always forced. This fixes a bug causing run-directive rules to not being executed even when enforced via e.g. -R. ([#2448](#2448)) ([b2a60d5](b2a60d5)) * fix cache handling and unlock handling ([2f4d5e1](2f4d5e1)) * fix nargs definition for --deploy-sources ([fc252c8](fc252c8)) * fix path handling when detective profiles ([fe63881](fe63881)) * fix storage handling on windows by converting all paths to posix paths ([#2519](#2519)) ([7864a76](7864a76)) * handle different f-string tokens in py3.12 ([#2485](#2485)) ([f2c7613](f2c7613)) * handle storage for local jobs; add test case ([6d978ef](6d978ef)) * handling of group jobs when obtaining temp input files ([71be1de](71be1de)) * import ([#2402](#2402)) ([2c831f1](2c831f1)) * improved error handling for storage upload; fixed bugs caused by outdated calls to IOFile.exists(). ([720bb84](720bb84)) * improved error messages in case of invalid storage queries ([9671fd0](9671fd0)) * in addition to localrules statement, infer that job is local if it has any input or output file that is marked as local ([#2541](#2541)) ([e8b682b](e8b682b)) * only deactivate conda inject envs upon workflow tear down ([#2503](#2503)) ([e6dfdd4](e6dfdd4)) * Panoptes --wms-monitor-arg ([#2444](#2444)) ([98d2bdf](98d2bdf)) * proper reuse of rule proxies when importing several times from the same module ([#2404](#2404)) ([e867dda](e867dda)) * Restore backward compatibility for Google Life Sciences executor ([#2461](#2461)) ([5e3a464](5e3a464)) * shadow "full" mode ignore symlinks ([#2516](#2516)) ([1d58120](1d58120)) * show failed logs in executor testcases ([92f7bf4](92f7bf4)) * Slack log service ([#2537](#2537)) ([26eb4ba](26eb4ba)) * sort report (sub-)categories in lexicographical order ([#2449](#2449)) ([d0705ad](d0705ad)) * update minimum snakemake-interface-storage-plugins version ([0ef7226](0ef7226)) * use temporary directory (faster, more likely local, always writable) for persistence and source cache in case of remote execution without shared fs ([#2502](#2502)) ([c8fa7ba](c8fa7ba)) * wait for logs before showing them on error ([a4ff328](a4ff328)) ### Documentation * document name directive with example ([#2534](#2534)) ([cce5551](cce5551)) * fix syntax in cluster example ([#2460](#2460)) ([64e9645](64e9645)) * notes on arm based machines in tutorial docs ([0586f04](0586f04)) * **rust:** Fix typo on rust-script version ([#2488](#2488)) ([a79dd94](a79dd94)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Johannes Köster <[email protected]>








Hi @johanneskoester! 👋 As we chat about in a thread somewhere, I think it would be really powerful to allow for installing (and discovering) external plugins to Snakemake. Specifically for the Flux Operator, I have easily three designs I'm testing, and it's not really appropriate to add them proper to snakemake - but I believe the developer user should be empowered to flexibly add/remove and test them out.
This pull request is first try demo of how snakemake could allow external executor plugins. I say "first try" because it's the first time I've experimented with plugins, and I tried to choose a design that optimizes simplicity and flexibility without requiring external packages, or specific features of setuptools or similar (that are likely to change). The basic design here uses pkgutil to discover snakemake_executor_* plugins, and then provides them to the client (to add arguments) and scheduler to select using one with
--executor.I've written up an entire tutorial and the basic design in this early prototype, which is basically the current Flux integration as a plugin! https://github.com/snakemake/snakemake-executor-flux. The user would basically do:
I've designed it so that plugins are validated only when chosen, and each plugin can add or otherwise customize the parser, and then (after parsing) further tweak the args if chosen. Then in scheduler.py, we simply look if the user selected a plugin, and call the main executor (and local_executor) classes if this is the case.
The one hard piece is having a flexible way to pass forward all those custom arguments. The current snakemake design has basically a custom boolean for every executor hard coded (e.g.,
--fluxor--slurm) and while we don't want to blow that up, I'm worried moving forward passing all these custom namespaced arguments through the init, workflow, scheduler/dag, is going to get very messy. So the approach here is a suggested way to handle the expanding space of additional executors by way of passing forward full args, and then allowing the plugins to customize the parser before or after. If we were to, for example, turn current executors into plugins (something I expect we might want to do for the Google Life Sciences API that is going to be deprecated in favor of batch) we could write out a more hardened spec - some configuration class that is passed from the argument parser through the executor and through execution (instead of all the one off arguments).Anyway - this is just a first shot and I'm hoping to start some discussion! This is a totally separate thing from TBA work with Google Batch - this is something that I've wanted to try for a while as I've wanted to add more executors and have seen the executor space exploding. 😆 I haven't written tests or updated any docs yet pending our discussion!
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).