Skip to content

Fix cpu count with cgroup limits#33342

Merged
Felixoid merged 3 commits intoClickHouse:masterfrom
JaySon-Huang:get_cpu_cores_under_cgroup
Jan 22, 2022
Merged

Fix cpu count with cgroup limits#33342
Felixoid merged 3 commits intoClickHouse:masterfrom
JaySon-Huang:get_cpu_cores_under_cgroup

Conversation

@JaySon-Huang
Copy link
Copy Markdown
Contributor

@JaySon-Huang JaySon-Huang commented Dec 31, 2021

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Respect cgroup limits for CPU quota

Detailed description / Documentation draft:
close #2261, close #25662
Try to check cgroup limits under /sys/fs/cgroup/cpu/. Reference: http://hg.openjdk.java.net/jdk/jdk/rev/7f22774a5f42

By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in docs folder.

If you are doing this for the first time, it's recommended to read the lightweight Contributing to ClickHouse Documentation guide first.

Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/

@robot-clickhouse robot-clickhouse added the pr-bugfix Pull request with bugfix, not backported by default label Dec 31, 2021
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Dec 31, 2021

CLA assistant check
All committers have signed the CLA.

@JaySon-Huang
Copy link
Copy Markdown
Contributor Author

JaySon-Huang commented Dec 31, 2021

Tested by creating a simple docker image:

// a.cpp
#include <iostream>
#include "getNumberOfPhysicalCPUCores.h"
int main()
{
    std::cout << "thread: " << getNumberOfPhysicalCPUCores() << std::endl;
    return 0;
}
// g++ a.cpp

> cat Dockerfile
FROM centos
COPY a.out /a.out
CMD ["a.out"]

> docker build -t='jayson/test_cpu' .

>  docker run --rm --cpus="2" jayson/test_cpu ./a.out
thread: 2
>  docker run --rm --cpus="4" jayson/test_cpu ./a.out
thread: 4
>  docker run --rm --cpus="8" jayson/test_cpu ./a.out
thread: 8
>  docker run --rm --cpus="40" jayson/test_cpu ./a.out
thread: 40
>  docker run --rm --cpus="80" jayson/test_cpu ./a.out
docker: Error response from daemon: Range of CPUs is from 0.01 to 40.00, as there are only 40 CPUs available.
See 'docker run --help'.
> docker run --rm --cpus="8.6" jayson/test_cpu ./a.out
thread: 9

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Dec 31, 2021
@robot-clickhouse robot-clickhouse added pr-improvement Pull request with some product improvements and removed pr-bugfix Pull request with bugfix, not backported by default labels Dec 31, 2021
@azat
Copy link
Copy Markdown
Member

azat commented Jan 1, 2022

Tested by creating a simple docker image:

By the way, this can be tested using integration test, something similar to https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_jemalloc_percpu_arena/test.py

@JaySon-Huang
Copy link
Copy Markdown
Contributor Author

Tested by creating a simple docker image:

By the way, this can be tested using integration test, something similar to https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_jemalloc_percpu_arena/test.py

I'll try to add some tests later this week

@bobrik
Copy link
Copy Markdown
Contributor

bobrik commented Jan 5, 2022

How does this work with cgroupv2 only hierarchies?

@JaySon-Huang
Copy link
Copy Markdown
Contributor Author

@bobrik How does this work with cgroupv2 only hierarchies?

After doing some research, this PR doesn't work under cgroup v2.
However, now both CPU and Memory don't work under cgroup v2, and v2 is not enabled by default as far as I know. Maybe we can open another issue for cgroup v2 limits?

Reference:

@bobrik
Copy link
Copy Markdown
Contributor

bobrik commented Jan 7, 2022

I think it's ok to just add v1 and have a separate issue for v2.

@JaySon-Huang
Copy link
Copy Markdown
Contributor Author

"Stress test (debug, actions) — OOM killer (or signal 9) in clickhouse-server.log"

I'm not sure why this test is failed.

@Felixoid
Copy link
Copy Markdown
Member

@Mergifyio update

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 14, 2022

update

✅ Branch has been successfully updated

@Felixoid
Copy link
Copy Markdown
Member

@Mergifyio update

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 21, 2022

update

❌ Base branch update has failed

Details

merge conflict between base and head
err-code: 93E34

Signed-off-by: JaySon-Huang <[email protected]>
Signed-off-by: JaySon-Huang <[email protected]>
@JaySon-Huang JaySon-Huang force-pushed the get_cpu_cores_under_cgroup branch from bc20c24 to d71a31b Compare January 21, 2022 13:03
@JaySon-Huang
Copy link
Copy Markdown
Contributor Author

I think this PR is ready to be reviewed. Is there anything I need to do?

@Felixoid
Copy link
Copy Markdown
Member

I am waiting for CI and going to merge it 👍

@thyn
Copy link
Copy Markdown

thyn commented Jan 22, 2022

Is any plans to backport this to 21.8?

@alexey-milovidov
Copy link
Copy Markdown
Member

@thyn No, as it is not a bugfix.
The only effect is that queries are run slightly less efficient and consume slightly more memory.

@Felixoid Felixoid merged commit 13f7f56 into ClickHouse:master Jan 22, 2022
@JaySon-Huang JaySon-Huang deleted the get_cpu_cores_under_cgroup branch January 22, 2022 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detect that server is running under cgroup control, get cpu/memory info with limitations Clickhouse doesn't respect cgroup limits

8 participants