Skip to content

Conversation

@r4victor
Copy link
Collaborator

Closes #3238

Prerequisite for #3059

@peterschmidt85
Copy link
Contributor

Thank you, @r4victor! Will review tomorrow. We can merge it after the release once I review if it's OK?


If a fleet is created automatically, it stays `idle` for 5 minutes by default and can be reused within that time.
If the fleet is not reused within this period, it is automatically terminated.
If a run provisions a new instance, the instance stays `idle` for 5 minutes by default and can be reused within that time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't we dropped the defaults for idle_duration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped max_duration default long ago but not idle_duration

This comment was marked as resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, its something else. Should I create a new issue about dropping idle_duration defaults? I think it's important

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with idle_duration: off by default, we'll have a much higher chance to forget to set it and pay $$$ for idle instances. Otherwise, I support getting rid of random defaults.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Fleets, I wopuld probably make it more visible that user can set either fixed number of nodes or a range. Currently we only show a fixed number. A range is going to be even more popular choice. I would show both and explicitely tell why one or the other should be used.

Let me know if you want me to update it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the range example should also mention idle_duration explicitely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to push a commit

## Create fleet

Before submitting distributed training runs, make sure to create a fleet with a `placement` set to `cluster`.
Before submitting distributed training runs, make sure to create a fleet with `placement: cluster`. Here's a fleet configuration suitable for this example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In single-node-training, you don't add Create fleet section, why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'd probably make this section collapsed by default, as it repeats everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumed users may already have a suitable fleet since cluster is not required – for the same reason Tasks, Services, Dev environments pages don't have Create fleet. But we can add Create fleet section everywhere if you like that.

```yaml
type: fleet
name: default-fleet
nodes: 0..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, I'd also add resources to show that it's possible to limit what GPU types are allowed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3249

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe once #3249 is fixed then?

@peterschmidt85 peterschmidt85 merged commit de0ae07 into master Nov 2, 2025
26 checks passed
@peterschmidt85 peterschmidt85 deleted the issue_3238_fleet_first_docs branch November 2, 2025 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update docs and examples to fleet-first UX

3 participants