feat: subsample jobs to speed-up scheduler (#3112)

fgvieira · coderabbitai[bot] · web-flow · commit e10feef26232 · 2024-10-20T19:17:46.000+02:00
I am running a workflow with ~700k jobs and, at each given time, there are around 230k jobs ready to be run. The initial building of the DAG is quite slow (~2h, but I'll leave that for another PR 😄), but the main issue is that the scheduler takes a lot of time deciding the next jobs to be submitted. In my case, all jobs are quite fast and similar in terms of resources, so the cluster is idle most of the time. The greedy scheduler is considerably faster, but still too slow. The ILP should switch to the greedy after 10s, but it sometimes ignores the timeout (coin-or/Cbc#487) and it has been reported being quite slow instantiating large problems (coin-or/pulp#749). In my case, the ILP runs for 60s (the pulp file is 100Mb) before switching to greedy. Apart from that, and specially on slow file systems, the scheduler can still be quite slow checking all temp and input files. Here, I propose sampling ready jobs, so that only a subset of jobs (instead of all ready jobs) are evaluated by the scheduler. In my tests, this greatly reduces the scheduler time: | | ILP | greedy | |---|---|---| | Native |15 - 20 mins |30s - 1 min | | Sampling 1000 jobs | | 1 - 2 s | ### QC  * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).  ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Introduced a new argument `--scheduler-subsample` to optimize job scheduling by limiting the number of jobs considered for execution. - Added a method for inferring resource requirements, enhancing user experience with better error handling. - Updated settings to include a new attribute for job subsampling, improving scheduling flexibility. - **Bug Fixes** - Improved error handling and logging for resource evaluation and parsing, providing clearer guidance for users. - Enhanced job selection process with a subsampling mechanism to optimize scheduling efficiency. - **Refactor** - Enhanced structure and organization of job scheduling logic for better integration with existing functionality.  --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
diff --git a/snakemake/cli.py b/snakemake/cli.py
@@ -1414,6 +1414,16 @@ def get_argument_parser(profiles=None):
         "value (1.0) provides the best speed and still acceptable scheduling "
         "quality.",
     )
+    group_behavior.add_argument(
+        "--scheduler-subsample",
+        type=int,
+        default=None,
+        help="Set the number of jobs to be considered for scheduling. If number "
+        "of ready jobs is greater than this value, this number of jobs is randomly "
+        "chosen for scheduling; if number of ready jobs is lower, this option has "
+        "no effect. This can be useful on very large DAGs, where the scheduler can "
+        "take some time selecting which jobs to run.",
+    )
     group_behavior.add_argument(
         "--no-hooks",
         action="store_true",
@@ -2127,6 +2137,7 @@ def args_to_api(args, parser):
                                 ilp_solver=args.scheduler_ilp_solver,
                                 solver_path=args.scheduler_solver_path,
                                 greediness=args.scheduler_greediness,
+                                subsample=args.scheduler_subsample,
                                 max_jobs_per_second=args.max_jobs_per_second,
                                 max_jobs_per_timespan=args.max_jobs_per_timespan,
                             ),
diff --git a/snakemake/resources.py b/snakemake/resources.py
@@ -654,7 +654,6 @@ def infer_resources(name, value, resources: dict):
             raise WorkflowError(
                 f"Cannot parse runtime value into minutes for setting runtime resource: {value}"
             )
-        logger.debug(f"Inferred runtime value of {parsed} minutes from {value}")
         resources["runtime"] = parsed
 
 
diff --git a/snakemake/scheduler.py b/snakemake/scheduler.py
@@ -62,6 +62,7 @@ def __init__(self, workflow, executor_plugin: ExecutorPlugin):
         self.failed = set()
         self.finished_jobs = 0
         self.greediness = self.workflow.scheduling_settings.greediness
+        self.subsample = self.workflow.scheduling_settings.subsample
         self._tofinish = []
         self._toerror = []
         self.handle_job_success = True
@@ -263,7 +264,18 @@ def schedule(self):
                         job.reset_params_and_resources()
 
                     logger.debug(f"Resources before job selection: {self.resources}")
-                    logger.debug(f"Ready jobs: {len(needrun)}")
+
+                    # Subsample jobs to be run (to speedup solver)
+                    n_total_needrun = len(needrun)
+                    if self.subsample and n_total_needrun > self.subsample:
+                        import random
+
+                        needrun = set(random.sample(tuple(needrun), k=self.subsample))
+                        logger.debug(
+                            f"Ready subsampled jobs: {len(needrun)} (out of {n_total_needrun})"
+                        )
+                    else:
+                        logger.debug(f"Ready jobs: {n_total_needrun}")
 
                     if not self._last_job_selection_empty:
                         logger.info("Select jobs to execute...")
@@ -506,7 +518,6 @@ def job_selector_ilp(self, jobs):
             if not self.resources["_cores"]:
                 return set()
 
-            # assert self.resources["_cores"] > 0
             scheduled_jobs = {
                 job: pulp.LpVariable(
                     f"job_{idx}", lowBound=0, upBound=1, cat=pulp.LpInteger
diff --git a/snakemake/settings/types.py b/snakemake/settings/types.py
@@ -289,14 +289,17 @@ class SchedulingSettings(SettingsBase):
     ilp_solver:
         Set solver for ilp scheduler.
     greediness:
-        set the greediness of scheduling. This value between 0 and 1 determines how careful jobs are selected for execution. The default value (0.5 if prioritytargets are used, 1.0 else) provides the best speed and still acceptable scheduling quality.
+        Set the greediness of scheduling. This value, between 0 and 1, determines how careful jobs are selected for execution. The default value (0.5 if prioritytargets are used, 1.0 else) provides the best speed and still acceptable scheduling quality.
+    subsample:
+        Set the number of jobs to be considered for scheduling. If number of ready jobs is greater than this value, this number of jobs is randomly chosen for scheduling; if number of ready jobs is lower, this option has no effect. This can be useful on very large DAGs, where the scheduler can take some time selecting which jobs to run."
     """
 
     prioritytargets: AnySet[str] = frozenset()
     scheduler: str = "ilp"
     ilp_solver: Optional[str] = None
     solver_path: Optional[Path] = None
     greediness: Optional[float] = None
+    subsample: Optional[int] = None
     max_jobs_per_second: Optional[int] = None
     max_jobs_per_timespan: Optional[MaxJobsPerTimespan] = None
 
@@ -312,8 +315,11 @@ def _get_greediness(self):
             return self.greediness
 
     def _check(self):
-        if not (0 < self.greedyness <= 1.0):
-            raise ApiError("greediness must be >0 and <=1")
+        if not (0 <= self.greediness <= 1.0):
+            raise ApiError("greediness must be >=0 and <=1")
+        if self.subsample:
+            if not isinstance(self.subsample, int) or self.subsample < 1:
+                raise ApiError("subsample must be a positive integer")
 
 
 @dataclass

Original file line number	Diff line number	Diff line change
`@@ -654,7 +654,6 @@ def infer_resources(name, value, resources: dict):`
`654`	`654`	`raise WorkflowError(`
`655`	`655`	`f"Cannot parse runtime value into minutes for setting runtime resource: {value}"`
`656`	`656`	`)`
`657`		`- logger.debug(f"Inferred runtime value of {parsed} minutes from {value}")`
`658`	`657`	`resources["runtime"] = parsed`
`659`	`658`
`660`	`659`