Skip to content

UCS/TOPO: Replace numa_distance with sysfs based implementation#8698

Merged
yosefe merged 1 commit intoopenucx:masterfrom
ofirfarjun7:topic/topo-numa-distance
Apr 5, 2023
Merged

UCS/TOPO: Replace numa_distance with sysfs based implementation#8698
yosefe merged 1 commit intoopenucx:masterfrom
ofirfarjun7:topic/topo-numa-distance

Conversation

@ofirfarjun7
Copy link
Copy Markdown
Contributor

@ofirfarjun7 ofirfarjun7 commented Nov 13, 2022

What

Implement numa_distance function to calculate the distance between IB devices and CPU's NUMA node.

Why ?

  • Remove UCX libnuma dependency

How ?

  • Extend Topo providers API to include method for calculating NUMA distance between IB devices and CPU's NUMA node.
  • Remove/Replace libnuma dependent code.
  • Remove libnuma from UCX

@ofirfarjun7 ofirfarjun7 added the WIP-DNM Work in progress / Do not review label Nov 13, 2022
@ofirfarjun7 ofirfarjun7 force-pushed the topic/topo-numa-distance branch from 80a3b9a to 7bb6d56 Compare November 13, 2022 17:07
@ofirfarjun7 ofirfarjun7 force-pushed the topic/topo-numa-distance branch 8 times, most recently from f5b41df to 479b0d4 Compare November 29, 2022 08:51
@ofirfarjun7 ofirfarjun7 force-pushed the topic/topo-numa-distance branch 3 times, most recently from 9ff60ba to cd5254a Compare December 13, 2022 09:19
@ofirfarjun7 ofirfarjun7 marked this pull request as ready for review December 13, 2022 09:46
@ofirfarjun7 ofirfarjun7 removed the WIP-DNM Work in progress / Do not review label Dec 13, 2022

UCX_STATIC_LDFLAGS = -static $(shell pkg-config --libs --static $(EXTRA_MODULES) ucx) -lnuma
UCT_STATIC_LDFLAGS = -static $(shell pkg-config --libs --static $(EXTRA_MODULES) ucx-uct) -lnuma
UCX_STATIC_LDFLAGS = -static $(shell pkg-config --libs --static $(EXTRA_MODULES) ucx) $(NUMA_LIBS)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to revert in the following PR and remove linking with libnuma from all makefiles

@gleon99
Copy link
Copy Markdown
Contributor

gleon99 commented Jan 10, 2023

PR description:

  • "numa distance" -> numa_distance
  • numa -> NUMA
  • libnuma -> libnuma

return ucs_topo_get_avg_distance_default(device, distance);
}

ret = ucs_sys_getaffinity(&process_affinity);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use thread affinity instead (pthread_getaffinity_np).
Please add a func in ucs_sys, reuse in uct_ib_md_handle_mr_list_multithreaded.

Copy link
Copy Markdown
Contributor

@gleon99 gleon99 Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ofirfarjun7 unresolved?

@ofirfarjun7 ofirfarjun7 requested a review from gleon99 February 19, 2023 11:20

#include <stdint.h>

#define UCS_NUMA_MIN_DISTANCE 10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it need to be in H file?

Copy link
Copy Markdown
Contributor Author

@ofirfarjun7 ofirfarjun7 Mar 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have references to this macro in other files (topo.c)
Need to remove from numa.h (leftovers...)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect topo.c to use ucs_topo_default_distance

@ofirfarjun7 ofirfarjun7 requested a review from gleon99 March 26, 2023 14:59

static ucs_status_t
ucs_sys_enum_threads_cb(const struct dirent *entry, void *_ctx)
ucs_sys_enum_threads_cb(const struct dirent *entry, void *arg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename it also in H file

yosefe
yosefe previously approved these changes Apr 3, 2023
Copy link
Copy Markdown
Contributor

@gleon99 gleon99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If already committing some minor fixes, please go over the docstrings - I think there are several places with "excessive spacing" (args, return, etc).

if (ret != 0) {
ucs_error("pthread_getaffinity_np() failed: %m");
return UCS_ERR_INVALID_PARAM;
return ret;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"int ret" -> ucs_status_t status
!= 0 -> != UCS_OK

gleon99
gleon99 previously approved these changes Apr 4, 2023
@gleon99
Copy link
Copy Markdown
Contributor

gleon99 commented Apr 4, 2023

@ofirfarjun7 please squash.

@ofirfarjun7 ofirfarjun7 force-pushed the topic/topo-numa-distance branch from 66098b2 to ef439b0 Compare April 4, 2023 12:59
@ofirfarjun7 ofirfarjun7 added the WIP-DNM Work in progress / Do not review label Apr 4, 2023
@ofirfarjun7 ofirfarjun7 force-pushed the topic/topo-numa-distance branch from ef439b0 to c21b5bb Compare April 4, 2023 13:15
@ofirfarjun7 ofirfarjun7 removed the WIP-DNM Work in progress / Do not review label Apr 4, 2023
@ofirfarjun7 ofirfarjun7 requested a review from gleon99 April 4, 2023 13:16
@yosefe yosefe enabled auto-merge April 5, 2023 08:19
@yosefe yosefe merged commit 80c0edb into openucx:master Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants