Skip to content

Enable gds, gdrcopy and mofed flags by default#1550

Merged
rahulait merged 1 commit intoNVIDIA:mainfrom
rahulait:dynamic-detect-gdrcopy
Jan 15, 2026
Merged

Enable gds, gdrcopy and mofed flags by default#1550
rahulait merged 1 commit intoNVIDIA:mainfrom
rahulait:dynamic-detect-gdrcopy

Conversation

@rahulait
Copy link
Copy Markdown
Contributor

@rahulait rahulait commented Dec 4, 2025

Description

With this change, we now always try to dynamically detect if the drivers are present or not.
Current code takes care of failure cases to fail-safe and we are leveraging that
for dynamic driver detection.

Testing

  • Manual cluster testing (describe below)

Manually built an image with these changes and then deployed gpu-operator (once with CDI enabled and once with CDI disabled) with the image and gdrcopy driver enabled.

Verified gdrcopy specific spec is generated in /var/run/cdi when cdi and gdrcopy are enabled. Verified sample workload works fine.
Verified no spec generated when cdi is false and sample workload failing as driver is missing.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Dec 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment thread internal/utils/utils.go Outdated
Comment thread internal/utils/utils.go Outdated
Comment thread cmd/nvidia-device-plugin/plugin-manager.go Outdated
@rahulait rahulait force-pushed the dynamic-detect-gdrcopy branch 2 times, most recently from eb38ccb to 7a2c95a Compare December 25, 2025 06:22
Comment thread internal/plugin/server.go
@rahulait rahulait force-pushed the dynamic-detect-gdrcopy branch from 7a2c95a to 06394f9 Compare January 9, 2026 22:06
@rahulait rahulait marked this pull request as ready for review January 9, 2026 22:06
With this change, we now always try to dynamically detect if the drivers are present or not.
Current code takes care of failure cases to fail-safe and we are leveraging that
for dynamic driver detection.

Signed-off-by: Rahul Sharma <[email protected]>
@rahulait rahulait force-pushed the dynamic-detect-gdrcopy branch from 06394f9 to 0af9b62 Compare January 9, 2026 22:06
@rahulait rahulait changed the title [WIP] : add ability to dynamically detect gdrcopy module add ability to dynamically detect gdrcopy module Jan 9, 2026
@rahulait rahulait requested a review from elezar January 13, 2026 16:35
Copy link
Copy Markdown
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @rahulait. Let's get another review from @elezar before merging this.

Copy link
Copy Markdown
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rahulait. This looks good. Sorry for the delay.

I have also updated the PR title and description to represent the actual state.

@elezar elezar changed the title add ability to dynamically detect gdrcopy module Enable gds, gdrcopy and mofed flags by default Jan 15, 2026
@elezar
Copy link
Copy Markdown
Member

elezar commented Jan 15, 2026

/cherry-pick release-0.18

@rahulait rahulait merged commit 08f5400 into NVIDIA:main Jan 15, 2026
1 check passed
@github-actions
Copy link
Copy Markdown

🤖 Backport PR created for release-0.18: #1585

@elezar
Copy link
Copy Markdown
Member

elezar commented Jan 22, 2026

Removing the cherry-pick label. Let's readd it if we decide that we need this in the maintenance branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants