Skip to content

Conversation

@penggu
Copy link
Contributor

@penggu penggu commented Dec 24, 2024

add event handler for adding/deleting/updating HyperNode in scheduler cache.

@volcano-sh-bot
Copy link
Contributor

Welcome @penggu!

It looks like this is your first PR to volcano-sh/volcano.

Thank you, and welcome to Volcano. 😃

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign wpeng102
You can assign the PR to them by writing /assign @wpeng102 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 24, 2024
hnfb.nodes[hn.Name] = hntn

tierStr := hn.Spec.Tier
tier, err := strconv.Atoi(tierStr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, in api definition, the tier is a type of string, what if the user wants to specify tier name other than numbers? Is there any purpose in defining the type of tier as string before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Tier should be an int. If a "tier name" is needed, then a mapping from a "tier name" string to its "tier number" int is needed.

nodeName := member.Selector.ExactMatch.Name
hntn.NodeMatchers.Insert(NewExactMatcher(nodeName))
workerNode := sc.Nodes[nodeName]
if workerNode == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a scenario that if the HyperNode is created first and then handles the event of the HyperNode, the Node has not been created yet and is not in the cache. If we directly return here and there is no change of the HyperNode, there will no longer be new events related to the HyperNode. You can look at the logic in vc-controller. The event of resource change and the worker responsible for processing collaborate through the workqueue. If the event processing fails, the worker can throw it back to the workqueue, and then process it from the workqueue later. So we may need to add worker and workqueue here, and re-enqueue the event if it fails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bring up this scenario. I am aware of it. My hope is that the event handler for Node need to update the HyperNode (I did not implement that logic in the current PR. I was planning to add it later).

Your proposed solution (using workqueue) should work for Exact match. I wonder how does it work for Regex match. In case of Regex match, when will we know for sure that no future child node will appear -- hence we stop throwing it back to the workqueue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better solution is to update HyperNodeTree when AddNode and DeleteNode, so that it doesn't matter if Node is not created yet, both ExactMatch and RegexMatch can be included. Our original intention of designing RegexMatch is actually to serve Node.

case networktopov1alpha1.ExactMatchMemberSelectorType:
memberName := member.Selector.ExactMatch.Name
hntn.MemberMatchers.Insert(NewExactMatcher(memberName))
member := sc.hypernodeForestbuilder.nodes[memberName]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem, the HyperNode events here are out of order, and the Member may not have been recorded in the builder. Then if the HyperNode does not change, there won't be more events coming

if member != nil {
insertHyperNodeMember(hntn, hnfb, memberName, member, sc)
if hntn.Parent != nil {
sc.HyperNodes[hntn.Parent.Name].Union(sc.HyperNodes[memberName])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also has the same problem here, Union is cloned and the return value is not processed, and I didn't get the point here, what is the purpose here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also has the same problem here, Union is cloned and the return value is not processed, and I didn't get the point here, what is the purpose here?

All members of a HyperNode should also be members of its parent

}
}
if hntn.Parent != nil {
sc.HyperNodes[hntn.Parent.Name].Union(sc.HyperNodes[hntn.Name])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also has the same problem here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also has the same problem here

Same answer: All members of a HyperNode should also be members of its parent

if hnfb.roots.Has(hn.Name) {
hnfb.roots.Delete(hn.Name)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed update sc.HyperNodeListByTier and sc.HyperNodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

CSINodesStatus map[string]*schedulingapi.CSINodeStatusInfo
HyperNodeListByTier map[int]sets.Set[string] // Maps tier to a set of hypernode names
HyperNodes map[string]sets.Set[string] // Maps hypernode name to a set of node names
HyperNodeTreeRoots HyperNodeSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that HyperNodeTreeRoots is not used. Is it still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It has been replaced by hypernodeForestbuilder.roots


// UpdateHyperNodeV1alpha1 update hypernode in scheduler cache
func (sc *SchedulerCache) UpdateHyperNodeV1alpha1(oldObj, newObj interface{}) {
sc.DeleteHyperNodeV1alpha1(oldObj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we directly implement the logic of update member and update hypernode here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct implementation involves

  1. Finding diff between the old and new CRD
  2. Update the forest based on the updated fields, and all the parent/child relationship based on ExactMatch and RegexMatch needs to be updated.

Because of the complicated logic involved in Step2, especially RegexMatch for both its children and its parents, that piece of code will be very complicated, and it will be a lot of duplicated logic from exiting AddHyperNodeV1alpha1() and DeleteHyperNodeV1alpha1().

i think the current implementation is easier to understand and to maintain. Please let me know if you think otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's great

@JesseStutler
Copy link
Member

Please rebase onto the new codes: #3874, and squash your commit into one, so I can review clearly

@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 26, 2024
}

// insertHyperNodeMember inserts a member into a hypernode and updates the forest
func insertHyperNodeMember(hntn *HyperNodeTreeNode, hnfb *HyperNodeForestBuilder, memberName string, memberNode *HyperNodeTreeNode, sc *SchedulerCache) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no distinction between the nodes in the leaf nodes and the hyperNodes of non-leaf nodes. What we expect is that the key stored in HyperNodes is the hyperNode, and the value is a list of all leaf nodes under the hyperNode.

case networktopov1alpha1.ExactMatchMemberSelectorType:
nodeName := member.Selector.ExactMatch.Name
hntn.NodeMatchers.Insert(NewExactMatcher(nodeName))
workerNode := sc.Nodes[nodeName]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sc.Nodes should be accessed with lock.

Parent *HyperNodeTreeNode
Members sets.Set[*HyperNodeTreeNode]
MemberMatchers sets.Set[*MemberMatcher]
Nodes sets.Set[*schedulingapi.NodeInfo]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need nodeName instead of NodeInfo.

HyperNodeListByTier map[int]sets.Set[string] // Maps tier to a set of hypernode names
HyperNodes map[string]sets.Set[string] // Maps hypernode name to a set of node names
HyperNodeTreeRoots HyperNodeSet
hypernodeForestbuilder *HyperNodeForestBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be locked.

@Monokaix
Copy link
Member

We should also expose a method like LCA(Lowest Common Ancestor) for HyperNodeTreeRoots, which will be used by score logic.

klog.V(4).Infof("Adding hypernode %s to the forest, in %d tier", hn.Name, tier)

// Iterate over members and add them to the forest
for _, member := range hn.Spec.Members {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be wrapped as a func, it' the logic to build a hyperNodeTreeNode, and the last thing to do is update the sc.HyperNodeListByTier and sc.HyperNodes.
And can we just use one var to express the tree?

Parent *HyperNodeTreeNode
Members sets.Set[*HyperNodeTreeNode]
MemberMatchers sets.Set[*MemberMatcher]
Nodes sets.Set[*schedulingapi.NodeInfo]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And seems nowhere to assign leaf nodes to sc.HyperNodes.

for _, parentNode := range sc.hypernodeForestbuilder.nodes {
for m := range parentNode.MemberMatchers {
if m.Exact != "" && m.Exact == hn.Name {
insertHyperNodeMember(parentNode, hnfb, hn.Name, hntn, sc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insertHyperNodeMember should be used as a method of *HyperNodeForestBuilder instead of passing it in through parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SchedulerCache should also be used as an attribute of HyperNodeForestBuilder. The function parameters must be consistent with the function name and responsibilities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't memerNode already contain a name attribute? No need to pass again

@penggu penggu force-pushed the network-topo-peng branch from aec453f to 5e3e22e Compare January 6, 2025 05:04
})
}

// TODO (penggu): create informat for HyperNode(v1alpha1) information
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO still need?

Tier int
Parent *HyperNodeTreeNode
Members sets.Set[*HyperNodeTreeNode]
MemberMatchers sets.Set[*MemberMatcher]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we group together Members/NodeMatchers and MemberMatchers/NodeMatchers? Too many unused fields

@volcano-sh-bot volcano-sh-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 7, 2025
Add AddFunc for HyperNode informer support to SchedulerCache

Network topology implements of scheduler

Signed-off-by: Monokaix <[email protected]>

Add plugin for networkTopology and score logic

Signed-off-by: Monokaix <[email protected]>

WIP1 HyperNode handling in scheduler cache

Change HyperNodesListByTier to a map of set (instead of string)

Refactor event handling for HyperNode cache

Remove unused updateHyperNodeListByTier function

Add child member handling for HyperNodes based on parent matchers

Move add and delete HyperNode functions to event_handlers.go

Refactor HyperNode handling: unify member updates
@penggu penggu force-pushed the network-topo-peng branch from d453afd to 4d68f2c Compare January 7, 2025 21:10
@volcano-sh-bot volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 14, 2025
@volcano-sh-bot
Copy link
Contributor

@penggu: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Monokaix
Copy link
Member

This pr has some code conflict to resolve and update, and #3964 will work based on this: )

@penggu
Copy link
Contributor Author

penggu commented Jan 16, 2025

This feature has been merged via PR #3964. So it is no longer needed. OK to close this PR now.

@penggu penggu closed this Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants