Skip to content

Conversation

@r4victor
Copy link
Collaborator

@r4victor r4victor commented Aug 11, 2025

A part of #1448
Fixes #2221
Fixes #2294

This PR:

  • Refactors process_submitted_jobs so that now master job chooses a fleet to be provisioned in (instead of choosing instances directly). This in particular allows choosing empty fleets and implement smart fleet choice logic: currently prioritizing fleets that can accommodate runs without extra provisioning (previously only offer price was considered). Fleets with no idle instances (including empty fleets) are chosen only if they are specified in configuration.fleets explicitly to avoid breaking current default behavior with creating new fleets on run apply in that case.
  • Removes auto deletion of empty fleets that allow 0 nodes.
  • Fixes configuration.replicas type checking erros.
  • Respects nodes.max for fleet configuration. (this change requires everyone to explicitly define elastic fleets so we'll postpone the enforcement until pre-creating fleets is required)
  • Removes redundant is_suitable_placement_group checks for backend type.

TBD (separate PRs):

  • Respect fleet configuration when provisioning new instances on run apply (i.e. implement fleet-run configuration merge).

@r4victor r4victor merged commit 5876f6f into master Aug 15, 2025
26 checks passed
@r4victor r4victor deleted the issue_1448_elastic_fleets branch August 15, 2025 10:38
Comment on lines +652 to +658
class Config(CoreModel.Config):
@staticmethod
def schema_extra(schema: Dict[str, Any]):
add_extra_schema_types(
schema["properties"]["replicas"],
extra_types=[{"type": "integer"}, {"type": "string"}],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r4victor, looks like this Config overrides ProfileParams.Config, so the service configuration JSON schema now includes some unnecessary properties like pool_name

https://dstack-runner-downloads-stgn.s3.eu-west-1.amazonaws.com/5458/schemas/configuration.json

Copy link
Collaborator Author

@r4victor r4victor Aug 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this and other Config overrides issues in https://github.com/dstackai/dstack/compare/issue_2994_pydantic_stored_types

It's minor, so shouldn't affect the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Error applying a configuration with replicas: ..2 [Bug]: dstack chooses a fleet with too few instances

3 participants