Skip to content

Expose Redis main thread cpu time, and scrape system time via INFO command. #8132

Merged
oranagra merged 1 commit intoredis:unstablefrom
filipecosta90:cpu.time
Dec 13, 2020
Merged

Expose Redis main thread cpu time, and scrape system time via INFO command. #8132
oranagra merged 1 commit intoredis:unstablefrom
filipecosta90:cpu.time

Conversation

@filipecosta90
Copy link
Contributor

@filipecosta90 filipecosta90 commented Dec 3, 2020

The following PR exposes the main thread CPU info via info modules ( linux specific only ) (used_cpu_sys_main_thread and used_cpu_user_main_thread). This is important for:

  • distinguish between main thread and io-threads cpu time total cpu time consumed ( check what is the first bottleneck on the used config )
  • distinguish between main thread and modules threads total cpu time consumed

Apart from it, this PR also exposes the server_time_in_microseconds within the Server section so that we can properly differentiate consecutive collection and calculate for example the CPU% and or / cpu time vs wall time, etc...

Note: given that I was already touching the CPU time collection code I've moved it to be called within the cpu section ( no need to measure cpu time if we're not outputting it ).

Here is a sample output of the proposed #CPU section on a 1 and 4 io-threads config ( while running a simple benchmark ):

1 io-thread (default)

We should see a small difference between main thread / all threads cpu time. Not very usefull for this simple scenario

$ redis-cli info cpu
# CPU
used_cpu_sys:15.972181
used_cpu_user:2.316606
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:15.972034
used_cpu_user_main_thread:2.316585

4 io-threads

We should see a large difference between main thread / all threads cpu time, and be able to properly identify the main/bg threads cpu usage.

$ redis-cli info cpu
# CPU
used_cpu_sys:18.644760
used_cpu_user:20.648113
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:6.978166
used_cpu_user_main_thread:2.860689

Regarding server time

Sample #SERVER section output:

$ redis-cli info server
# Server
# Server
redis_version:255.255.255
redis_git_sha1:c44a11c6
redis_git_dirty:0
redis_build_id:f6545694f8309272
redis_mode:standalone
os:Linux 5.4.0-56-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:9.3.0
process_id:310328
process_supervised:no
run_id:d25738d6a821c41f0cf1036078b8728c0c0eaaa8
tcp_port:6379
server_time_usec:1607863602885086
uptime_in_seconds:7
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:14028082
executable:/home/filipe/redislabs/redis/./src/redis-server
config_file:
io_threads_active:0

oranagra
oranagra previously approved these changes Dec 5, 2020
@oranagra
Copy link
Member

oranagra commented Dec 5, 2020

@redis/core-team please approve new info fields..

@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten labels Dec 5, 2020
itamarhaber
itamarhaber previously approved these changes Dec 5, 2020
@itamarhaber itamarhaber added the state:needs-doc-pr requires a PR to redis-doc repository label Dec 5, 2020
@itamarhaber
Copy link
Member

Thanks @filipecosta90 - it would be awesome if you make a PR that updates info.md in the redis-doc repo.

@yossigo
Copy link
Collaborator

yossigo commented Dec 9, 2020

@filipecosta90 I've pushed two suggestions -

  1. Rely on RUSAGE_THREAD being defined in sys/resource.h to auto-detect its availability.
  2. Return a numeric parsable value (-1) even if not supported to ease parsing.
    We should also consider simply omitting this value on platforms that don't support it (clients should deal with it missing anyway).
    Let me know if that makes sense.

@filipecosta90
Copy link
Contributor Author

We should also consider simply omitting this value on platforms that don't support it (clients should deal with it missing anyway).

@yossigo agree 👍 IMO it's better to omit than to present non numeric or negative. Should we do that change?

@filipecosta90
Copy link
Contributor Author

@yossigo @itamarhaber @oranagra can you check it now with the suggestions added?

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filipecosta90 you have a memory leak (missing sdsfree(m_cpu);)

p.s. other segments in the info string take different approach of putting a whole block of sdscatprintf inside a condition (rather than using %s in an existing segment). see the various blocks inside "info replication".
or alternatively, since that's an #ifdef, we can just use 3 ifdefs (one on getting the data, one on the top part of the printf, and one on the bottom part).
not sure if that's cleaner or dirtier in this case.

@filipecosta90
Copy link
Contributor Author

p.s. other segments in the info string take different approach of putting a whole block of sdscatprintf inside a condition (rather than using %s in an existing segment). see the various blocks inside "info replication".

Agree @oranagra. I was missing the part that we can only require one #ifdef . Can you check it now?

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

styling.

yossigo
yossigo previously approved these changes Dec 9, 2020
oranagra
oranagra previously approved these changes Dec 9, 2020
@filipecosta90 filipecosta90 dismissed stale reviews from oranagra and yossigo via 31d1a30 December 9, 2020 16:17
@filipecosta90 filipecosta90 force-pushed the cpu.time branch 3 times, most recently from 8dda67b to df0a020 Compare December 9, 2020 16:42
itamarhaber
itamarhaber previously approved these changes Dec 9, 2020
@filipecosta90
Copy link
Contributor Author

consider changing the new sdscatprintf for the main thread CPU to use sdscatfmt and %I too.

done @oranagra . please revise.

p.s. do you think we can save one division op by storing the division into a variable and doing a - instead of a %? i'm not sure if the compiler is smart enough to do that for us, and division is pricey..
on the other hand, this is not an "inner loop of a poligon filler" 8-) and it'll make the code ugly.

Agree that we might be over optimizing =)

yossigo
yossigo previously approved these changes Dec 9, 2020
oranagra
oranagra previously approved these changes Dec 9, 2020
@yossigo yossigo added the approval-needed Waiting for core team approval to be merged label Dec 10, 2020
@itamarhaber itamarhaber dismissed stale reviews from oranagra and yossigo via 2263830 December 12, 2020 15:43
itamarhaber
itamarhaber previously approved these changes Dec 12, 2020
@oranagra
Copy link
Member

oranagra commented Dec 12, 2020

@filipecosta90 sorry I just realized that when I suggested to change %ld.%06ld with %I.%I it actually translates to a wrong output (fractional part).
I.E 1234.000567 will be shown as 1234.567 (half way to 1235).
So maybe we need to change it back, or indeed split it to two fields each with its own line and name.

@oranagra
Copy link
Member

Considering used_cpu_sys:%ld.%06ld\r\n already existed (one line), I guess we must proceed in that path

@oranagra
Copy link
Member

come to think of it, a single number in the server_time_usec field may also be more useful for stats systems.
they can divide other metrics by this number more easily.

@oranagra oranagra merged commit 19d46f8 into redis:unstable Dec 13, 2020
@oranagra oranagra mentioned this pull request Jan 13, 2021
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Mar 2, 2021
…mmand. (redis#8132)

Exposes the main thread CPU info via info modules ( linux specific only )
(used_cpu_sys_main_thread and used_cpu_user_main_thread). This is important for:

- distinguish between main thread and io-threads cpu time total cpu time consumed ( check
  what is the first bottleneck on the used config )
- distinguish between main thread and modules threads total cpu time consumed

Apart from it, this commit also exposes the server_time_usec within the Server section so that we can
properly differentiate consecutive collection and calculate for example the CPU% and or / cpu time vs
wall time, etc...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants