Skip to content

Add --sleep-interval=infinite support to GFD for running as a pod#1603

Merged
rajathagasthya merged 1 commit intomainfrom
gfd-oneshot
Feb 3, 2026
Merged

Add --sleep-interval=infinite support to GFD for running as a pod#1603
rajathagasthya merged 1 commit intomainfrom
gfd-oneshot

Conversation

@rajathagasthya
Copy link
Copy Markdown
Contributor

@rajathagasthya rajathagasthya commented Jan 27, 2026

Reuse the existing --sleep-interval flag with support for 'infinite' as a special value. This causes GFD to label once and sleep indefinitely, which is useful for running as a Kubernetes pod that should not exit.

@rajathagasthya rajathagasthya changed the title Add --oneshot-daemon flag to GFD for running as a pod Add --oneshot-daemon flag to GFD for running as a pod Jan 27, 2026
Comment thread cmd/gpu-feature-discovery/main.go Outdated
Comment thread cmd/gpu-feature-discovery/main.go Outdated
@rajathagasthya rajathagasthya force-pushed the gfd-oneshot branch 2 times, most recently from dcca988 to 97c875f Compare January 27, 2026 22:17
Copy link
Copy Markdown
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding a new flag, does it not make sense to reuse the sleep interval? We could add support for infinite as a special string that disables the main loop.

See:

case string:
tmp, err := time.ParseDuration(value)
if err != nil {
return err
}
*d = Duration(tmp)
return nil
default:

Alternatively, a user COULD aready get close to this behaviour by setting this duration to something really long (years?).

@rajathagasthya rajathagasthya changed the title Add --oneshot-daemon flag to GFD for running as a pod Add --sleep-interval=infinite support to GFD for running as a pod Jan 28, 2026
@rajathagasthya rajathagasthya marked this pull request as draft January 28, 2026 23:44
@rajathagasthya rajathagasthya marked this pull request as ready for review January 29, 2026 16:18
Comment thread api/config/v1/duration.go

// IsInfinite returns true if the duration represents an infinite sleep interval.
func (d *Duration) IsInfinite() bool {
return d != nil && time.Duration(*d) == math.MaxInt64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned with overvflow if we set this to MaxInt64? We could also reaslistically use 0 or -1 as values here.

Copy link
Copy Markdown
Contributor Author

@rajathagasthya rajathagasthya Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's an overflow risk here. We always check IsInfinite() before using the duration in time.After(), When infinite, we skip the timer entirely. time.Duration(MaxInt64) is also a valid duration, so I'd prefer to keep it this way than using 0 or -1 sentinel values.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We could introduce a constant for Infinite as well. Not a blocker for this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I'll add that if there are other comments I need to address. If not, will do it in a follow up.

Comment thread api/config/v1/duration.go
Comment thread api/config/v1/duration.go Outdated
Copy link
Copy Markdown
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rajathagasthya. I think this has a much better UX than the other proposal.

I would recommend updating the Duration.MarshalJSON function to also output infinite when we save or log the config.

Reuse the existing --sleep-interval flag with support for 'infinite' as
a special value.  This causes GFD to label once and sleep indefinitely,
which is useful for running as a Kubernetes pod that should not exit.

Signed-off-by: Rajath Agasthya <[email protected]>
@rajathagasthya rajathagasthya merged commit 7b191f4 into main Feb 3, 2026
11 checks passed
@rajathagasthya rajathagasthya deleted the gfd-oneshot branch February 3, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants