Skip to content

Conversation

@kjin
Copy link
Contributor

@kjin kjin commented Jun 4, 2019

This PR adds support for arbitrary mount locations and hierarchies for cgroups, as well as memory limits from cgroups v2. Fixes #2315

It follows this procedure for resolving { cgroups_version, path } for the memory subsystem:

  1. Read /proc/self/mountinfo to get three fields:
    • the mount point for the memory subsystem (usually /sys/fs/cgroup/memory for cgroups v1, and /sys/fs/cgroup for cgroups v2)
    • the root path of the device being mounted at the location above (usually /, but could be /docker/${CONTAINER_PID} for docker containers, or any other value)
    • whether the memory subsystem is controlled by cgroups v1 or v2 (-> cgroups_version)
  2. Read /proc/self/cgroup to get a "hierarchy path" for the given subsystem. This comes from a line of the form ${ID}:memory:${HIERARCHY} in cgroups v1 and 0::${HIERARCHY} in cgroups v2. It should start with the root string mentioned above, and replacing this with the mount point should yield the "true path" to the memory parameter files (-> path). For example:
    • If the root found above was /, the mount_point was /sys/fs/cgroup2, and the hierarchy path read here was /foo/bar, then the true path should be /sys/fs/cgroup2/foo/bar
    • If the root found above was /docker/abcdef0, the mount_point was
      /sys/fs/cgroup/memory, and the hierarchy path read here was /docker/abcdef0/foo/bar, then the true path should be /sys/fs/cgroup/memory/foo/bar

If the attempt to get this true path fails or any assumptions above are violated, we should act as if cgroups is not present and just return 0 to signify no known upper memory limit. Otherwise, we use the information above to read either memory.limit_in_bytes (for v1) or memory.high/memory.max (for v2) from the correct location.

@kjin
Copy link
Contributor Author

kjin commented Jun 4, 2019

cc/ @ofrobots, @ChALkeR

Copy link
Contributor

@ofrobots ofrobots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still revieweing, but left some initial comments.

@kjin kjin force-pushed the cgroups-2 branch 2 times, most recently from d9ebec0 to 4b958dd Compare June 5, 2019 00:21
* remainder of the string (excluding the whitespace).
* Note that we assume that `start` ends with "\n\0".
*/
static const char* uv__mountinfo_next_field(const char* start,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest replacing this function with strsep. Since that is not portable, you can implement your own version, or use a public domain copy. It would be better to have a copy of strsep rathan than a function that is functionally equivalent but subtly different.

@ChALkeR
Copy link

ChALkeR commented Jun 6, 2019

This looks great at the first glance, thanks!
I will be able to look closer and test this out tomorrow, I think =).

*/
static int uv__read_cgroups_proc_files(uv__cgroups_subsystem_info_t* info, const char* subsystem) {
FILE* fp;
/* Should be big enough to fit a single line. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen in the case when a single line is bigger than the buffer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move buf definition to inside the loop to make the scope clearer?

@ofrobots
Copy link
Contributor

ofrobots commented Jun 6, 2019

I think this code can be rewritten in terms of strsep and the central loop parsing individual lines in mountinfo will simplified / easier to read once it uses named fields.

I'll re-review once those changes are done.

@kjin kjin force-pushed the cgroups-2 branch 4 times, most recently from e3cd9e4 to 8a95058 Compare June 10, 2019 14:04
return 0;
}
hierarchy_path_inner = hierarchy_path + strlen(root);
snprintf(info->path, CGROUPS_BUF_SIZE, "%s/%s", mount_point, hierarchy_path_inner);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a gcc format-truncation warning here on gcc v8.3.0.

Copy link

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem to work on testcase from #2315.
Details will follow.

Upd: cgroups_version is determined correctly, info.path is corrupted.
Also, the max/high logic seems strange.

@ChALkeR
Copy link

ChALkeR commented Jun 12, 2019

On two subsequent calls to uv_get_constrained_memory, this is what I observe in info.path:

  1. Length = 103, value = /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/run-r0858a2015da94808831f65532cf9d476.scope.
  2. Length = 1233, value =
/sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/run-r0858a2015da94808831f65532cf9d476.scopeon):     840500 kB
Inactive(anon):    50100 kB
Active(file):     814984 kB
Inactive(file):   620792 kB
Unevictable:      658560 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               112 kB
Writeback:             0 kB
AnonPages:        839480 kB
Mapped:           537664 kB
Shmem:            709808 kB
KReclaimable:     148448 kB
Slab:             208096 kB
SReclaimable:     148448 kB
SUnreclaim:        59648 kB
KernelStack:        6308 kB
PageTables:        19476 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8164272 kB
Committed_AS:    4379052 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:             1504 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      131472 kB
DirectMap2M:     6064128 kB
DirectMap1G:    10485760 kB

(yes, that whole block is the value of info.path).

Upd: even the first call seems corrupted on user session: length=60, value=/sys/fs/cgroup/user.slice/user-1000.slice/session-2.scopeA�.

@ChALkeR
Copy link

ChALkeR commented Jun 12, 2019

Overall, this seems to work after those issues would be fixed.

@ofrobots
Copy link
Contributor

@kjin is on vacation this week + next. This will be picked back up once he's back.

@kjin
Copy link
Contributor Author

kjin commented Jun 25, 2019

Re: info.path Corruption -- I am still trying to figure out why this is the case. I can't seem to figure out how to run with systemd-run --user, at least over ssh. Will look into it more over the next day.

@ChALkeR
Copy link

ChALkeR commented Jun 25, 2019

I can't seem to figure out how to run with systemd-run --user, at least over ssh.

@kjin which system are you using? And did you specify systemd.unified_cgroup_hierarchy=1 kernel option at boot? (or cgroup_no_v1=all, which also enables unified in newer versions)

@kjin
Copy link
Contributor Author

kjin commented Jun 25, 2019

@ChALkeR I'm using Debian stretch. I did specify systemd.unified_cgroup_hierarchy=1 and was able to see the v2 hierarchy. But I wasn't able to reproduce the bug that you mentioned, and I assume it was because I couldn't pass --user to systemd-run (it seems to work, even on consecutive calls, without this arg).

@kjin
Copy link
Contributor Author

kjin commented Jun 25, 2019

Ok, I was able to get it to reproduce on a local Linux machine.

@kjin kjin force-pushed the cgroups-2 branch 2 times, most recently from aa9ec9d to 92aed98 Compare June 26, 2019 01:50
Copy link
Contributor Author

@kjin kjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ofrobots @ChALkeR I think I've addressed all the issues except:

  • I'm using built-in strsep and do not provide an implementation myself. I would like to be able to conditionally implement it for systems without it but am not sure how. Looking at https://unixpapa.com/incnote/string.html (which contains a public domain implementation), they use HAVE_STRSEP, but this doesn't seem to be defined (at least on Debian/Ubuntu) even when strsep is. EDIT: This is a non-issue since cgroups is a Linux-only feature, we don't need to support strsep for all systems
  • I'm not sure what the best practice is on what to do if a line read from a file is too long for a statically-sized buffer. My instinct is to make it dynamic and double its size for every too-long line but want to be more certain before going ahead to do it. EDIT: After talking to @ofrobots, it seems like using getline() is the way to go here.

I'd appreciate guidance on these issues. Thanks!

@vtjnash vtjnash added pending-CI Ready to merge, just pending CI and removed not-stale Issues that should never be marked stale labels Sep 14, 2021
@kjin
Copy link
Contributor Author

kjin commented Sep 14, 2021

Ok, I've rebased and resolved the conflict -- which occurred due to part of uv__read_cgroups_uint64 being refactored into a common uv__slurp function in 1a89003. If CI passes I believe this should be ready to go!

@kjin
Copy link
Contributor Author

kjin commented Sep 16, 2021

Hi @bnoordhuis, do you mind reviewing this PR, or at least resolving the change request? It seems like quite a few people have been hoping to seeing this land. If you're busy, it would be good to perhaps assign a different reviewer to help move this forward? Thanks!

@maleadt
Copy link
Contributor

maleadt commented Sep 21, 2021

@vtjnash maybe?

@kjin
Copy link
Contributor Author

kjin commented Sep 21, 2021

If this PR remains stuck on the change-requested flag even after a third reviewer, I'd be inclined to open a new one, but I'd like to get blessing for that by an owner of libuv first.

@bnoordhuis
Copy link
Member

The next reviewer is welcome to dismiss my review if the comments have been addressed; IIRC, they were predominantly about string handling? I'm afraid I don't have time to review it myself.

Comment on lines 1050 to 1059
if (buf != NULL)
free(buf);
if (root != NULL)
uv__free(root);
if (mount_point != NULL)
uv__free(mount_point);
if (hierarchy_path != NULL)
uv__free(hierarchy_path);
if (subsystem_search_string != NULL)
uv__free(subsystem_search_string);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (buf != NULL)
free(buf);
if (root != NULL)
uv__free(root);
if (mount_point != NULL)
uv__free(mount_point);
if (hierarchy_path != NULL)
uv__free(hierarchy_path);
if (subsystem_search_string != NULL)
uv__free(subsystem_search_string);
free(buf);
uv__free(root);
uv__free(mount_point);
uv__free(hierarchy_path);
uv__free(subsystem_search_string);

continue;
strsep(&field_ptr, " "); /* mount options */
/* A hyphen marks the end of variable-length optional fields. */
while ('-' != field_ptr[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while ('-' != field_ptr[0])
while (field_ptr != NULL && '-' != field_ptr[0])

Comment on lines 885 to 892
while (feof(fp) == 0) {
if (getline(&buf, &buf_size, fp) < 0) {
if (feof(fp) != 0)
break;
rc = UV_EIO;
goto cleanup;
}
buf_strlen = strlen(buf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: from the docs for getline, seems more typical usage would look like:

Suggested change
while (feof(fp) == 0) {
if (getline(&buf, &buf_size, fp) < 0) {
if (feof(fp) != 0)
break;
rc = UV_EIO;
goto cleanup;
}
buf_strlen = strlen(buf);
while ((buf_strlen = getline(&buf, &buf_size, fp)) != -1) {
...
}
if (ferror(fp)) {
rc = UV_ERR(errno);
goto cleanup;
}

(with appropriate reindentation)


char filename[UV__PATH_MAX];
char buf[32]; /* Large enough to hold an encoded uint64_t. */
uint64_t rc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rc is now undefined on some code paths

Suggested change
uint64_t rc;
uint64_t rc = 0;

@vtjnash
Copy link
Member

vtjnash commented Sep 27, 2021

Most of @bnoordhuis comments were addressed, but a few seem to have been missed. I'll dismiss his review (and substitute my own approval), once my comments are addressed.

@vtjnash vtjnash removed the pending-CI Ready to merge, just pending CI label Sep 27, 2021
@kjin
Copy link
Contributor Author

kjin commented Oct 1, 2021

@vtjnash Thank you for the review -- I haven't forgotten about this, but unfortunately won't be able to circle back until next week.

@kjin
Copy link
Contributor Author

kjin commented Oct 5, 2021

@vtjnash OK, I've addressed your comments, both the news ones and the ones you left on Ben's review. I think there is one open question I had regarding wrapping terminal output when it's in a comment (context).

Also, apologies, I hadn't realized that you had already looked at this a year ago! (looking at the timestamps of some of your comments)

@bnoordhuis
Copy link
Member

bnoordhuis commented Nov 10, 2021

@kjin Thanks for picking this up again. Cgroups v2 support is valuable IMO but I'd like to cut down the amount of parsing libuv has to do; C is not a great language for that.

You parse /proc/self/mountinfo to determine if the system uses cgroups v1 or v2? Is that necessary? Doesn't the format of the memory field in /proc/self/cgroups tell you that?

Would open("/sys/fs/cgroup/unified", O_DIRECTORY|O_CLOEXEC) work? It's a v2 system when it's there, it seems reasonable to assume v1 when it's not. (Related subject: I prefer openat() over path munging.)

Let me know if you don't have the time or inclination to return to this and I'll try to adopt it.

edit: another thing: I think it should fall back to /sys/fs/cgroup/memory/memory.limit_in_bytes if procfs isn't mounted. Right now it seems to default to 0, no limit.

@kjin
Copy link
Contributor Author

kjin commented Nov 16, 2021

Hi @bnoordhuis, thanks for your input. I'll have bandwidth in around ~2 weeks (after American Thanksgiving) to work on this. If you (or anyone else) feel inclined to move this forward before then, please go ahead -- I want to make sure I'm not blocking anything critical. But I haven't forgotten about this PR, and I'll post back here again when I'm starting to work on it.

@stale
Copy link

stale bot commented Jan 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. Thank you for your contributions.

@maleadt
Copy link
Contributor

maleadt commented Aug 2, 2022

Since this has been marked as stale, and I don't have the necessary permissions to push here, I've opened a new pull request where I've rebased these changes and addressed one of the remaining review comments: #3712

@stale stale bot removed the stale label Aug 2, 2022
@vtjnash vtjnash closed this Sep 12, 2022
@kjin
Copy link
Contributor Author

kjin commented Jan 27, 2023

@maleadt Thank you for moving this forward. A lot of changes have happened since I opened this PR that have brought me far away from the Node.js ecosystem. I'm sorry that I couldn't address this in a more timely manner, but glad to see that you volunteered to take it over!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uv_get_constrained_memory does not seem to work? (checked under cgroup2)

9 participants