Skip to content

Conversation

@archlitchi
Copy link
Contributor

@archlitchi archlitchi commented May 15, 2024

Merge PR 3351: Fix NPU exception #3351
Fix NPU on metrics, leading to scheduler crash when running on none-nvidia type gpu node
Merge PR 3210: Add Score to devices
Add the entrance for volcano-vgpu new location in device-share.md

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 15, 2024
@@ -0,0 +1,134 @@
/*
Copyright 2019 The Volcano Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019 -> 2024

addResource(p1.Spec.Containers[0].Resources.Requests, gpunumber, "1")
addResource(p1.Spec.Containers[0].Resources.Requests, gpumemory, "1000")
p1.Spec.Containers[0].Resources.Limits = make(v1.ResourceList)
addResource(p1.Spec.Containers[0].Resources.Limits, gpunumber, "1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildResourceList can directly add scaler resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildResourceList doesn't add scalar resources to .containers.resources.limits, so we have to do it here manually

a higher score than those needs to evict a task */

// Use cached stored in filter state in order to avoid recalculating.
klog.V(3).Infof("Scoring pod %s with to node %s with score %f", pod.Name, gs.Name, gs.Score)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssn.AddNodeOrderFn has already print the score with log level 5, it's not necessary to print here cause it will print logs of logs.

Memory: uint(devmem),
Type: items[3],
Health: health,
if len(val) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed?

1.Fix device share plugins npe
2.Fix vgpu device handshake patch error
3.update and add deviceshare ut
@archlitchi archlitchi changed the title Fix device share plugins npe & update devicescore API Fix device share plugins npe & update devicescore May 16, 2024
@Monokaix
Copy link
Member

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label May 16, 2024
Copy link
Member

@william-wang william-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 16, 2024
@volcano-sh-bot volcano-sh-bot merged commit 63bf271 into volcano-sh:master May 16, 2024
@william-wang william-wang added this to the v1.9 milestone Jun 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants