-
Notifications
You must be signed in to change notification settings - Fork 207
Fleet-first docs #3242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fleet-first docs #3242
Conversation
|
Thank you, @r4victor! Will review tomorrow. We can merge it after the release once I review if it's OK? |
|
|
||
| If a fleet is created automatically, it stays `idle` for 5 minutes by default and can be reused within that time. | ||
| If the fleet is not reused within this period, it is automatically terminated. | ||
| If a run provisions a new instance, the instance stays `idle` for 5 minutes by default and can be reused within that time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't we dropped the defaults for idle_duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped max_duration default long ago but not idle_duration
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, its something else. Should I create a new issue about dropping idle_duration defaults? I think it's important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with idle_duration: off by default, we'll have a much higher chance to forget to set it and pay $$$ for idle instances. Otherwise, I support getting rid of random defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Fleets, I wopuld probably make it more visible that user can set either fixed number of nodes or a range. Currently we only show a fixed number. A range is going to be even more popular choice. I would show both and explicitely tell why one or the other should be used.
Let me know if you want me to update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the range example should also mention idle_duration explicitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to push a commit
| ## Create fleet | ||
|
|
||
| Before submitting distributed training runs, make sure to create a fleet with a `placement` set to `cluster`. | ||
| Before submitting distributed training runs, make sure to create a fleet with `placement: cluster`. Here's a fleet configuration suitable for this example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In single-node-training, you don't add Create fleet section, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'd probably make this section collapsed by default, as it repeats everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assumed users may already have a suitable fleet since cluster is not required – for the same reason Tasks, Services, Dev environments pages don't have Create fleet. But we can add Create fleet section everywhere if you like that.
docs/docs/quickstart.md
Outdated
| ```yaml | ||
| type: fleet | ||
| name: default-fleet | ||
| nodes: 0.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, I'd also add resources to show that it's possible to limit what GPU types are allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3249
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe once #3249 is fixed then?
Co-authored-by: Andrey Cheptsov <[email protected]>
Co-authored-by: Andrey Cheptsov <[email protected]>
Co-authored-by: Andrey Cheptsov <[email protected]>
Closes #3238
Prerequisite for #3059