Add INFO stat eviction_exceeded_time#9031
Conversation
madolson
left a comment
There was a problem hiding this comment.
I agree this is useful, just some english nitpicks.
|
@redis/core-team Please approve this small new info field. |
|
this metric can indeed shed some light on the matter, but maybe we want to add something even better? e.g. we start some timer as soon as we went over the memory limit (returning EVICT_RUNNING for the first time after a previous EVICT_OK), and then stop measuring when we change the state back to EVICT_OK. WDYT? |
|
This metric is more clear than count. For count we have to know implementation and |
|
do we wanna know the total time we spent in eviction? was that the purpose of the current counter? |
|
@oranagra Maybe we won't. After starting eviction, most time are spent. |
|
yes, as long a there's eviction pending, the commands will suffer latency, but that's actually desired in my eyes, it gives some back pressure to the commands to slow them down, otherwise, an aggressive workload will certainly win over eviction. did you intend to measure the relative percentage of the cpu / time spent on eviction? (i.e. the damage it does to commands) if it is the later, then i think the two metrics i suggested are better, if it is the first, we can add a metric for the total time spent in eviction (and maybe active expire), and try to match it against the total cpu time spent in redis, or specifically in commands. |
|
At first I intend to measure the relative percentage of the cpu / time spent on eviction. But after a second thought, when evicting, Redis uses all of its CPU and time. So this maybe not so useful. This may just tell us whether something slow down the eviction (expire or slow commands). I think two metrics you suggested are more useful. We can know whether eviction slow down the system and how much time it takes. |
|
I agree, I think the time spent is probably better. |
667c3ec to
d116d61
Compare
src/evict.c
Outdated
| int result = doPerformEvictions(); | ||
|
|
||
| if (result == EVICT_RUNNING) { | ||
| if (server.stat_last_eviction_exceeded_time == 0) { |
There was a problem hiding this comment.
Not required under single line {}.
There was a problem hiding this comment.
I'm confused whther `{} needed. And space like
f(a,b,c)
f(a, b, c)
a+b
a + b
Some codes have space and some not.
There was a problem hiding this comment.
In the old code many of them are without spaces, the back style is basically with spaces, but if you modify the old code, I tend not to change the original style.
src/evict.c
Outdated
|
|
||
| if (result == EVICT_RUNNING) { | ||
| if (server.stat_last_eviction_exceeded_time == 0) { | ||
| server.stat_last_eviction_exceeded_time = ustime(); |
There was a problem hiding this comment.
Is it possible to use server.ustime.
There was a problem hiding this comment.
I intend to use server.ustime. This function may be called in serverCron, the cached server.ustime may be outdated.
src/evict.c
Outdated
| server.stat_last_eviction_exceeded_time = ustime(); | ||
| } | ||
| } else if (server.stat_last_eviction_exceeded_time != 0) { | ||
| server.stat_eviction_exceeded_time += (ustime() - server.stat_last_eviction_exceeded_time); |
There was a problem hiding this comment.
i think the use of ustime() is actually wrong. we now have a monotonic clock, and that's what we should use to track (and sum) execution time (same as we now do in call for commandstats.
maybe we can also use the monotonic clock for current_eviction_exceeded_time?
src/evict.c
Outdated
| if (server.stat_last_eviction_exceeded_time == 0) { | ||
| server.stat_last_eviction_exceeded_time = ustime(); | ||
| } | ||
| exceeded_info: |
There was a problem hiding this comment.
maybe a better name would be update_metrics?
src/server.h
Outdated
| long long stat_evictedkeys; /* Number of evicted keys (maxmemory) */ | ||
| long long stat_eviction_exceeded_time; /* Total time over the memory limit */ | ||
| long long stat_last_eviction_exceeded_time; /* Timestamp of current eviction start */ | ||
| monotime stat_total_eviction_exceeded_time; /* Total time over the memory limit */ |
There was a problem hiding this comment.
is that really a monotime type? i think just a long long with us units (better mention the units in the comment)
There was a problem hiding this comment.
stat_last_eviction_exceeded_time is timestamp, so it is monotime.
stat_total_eviction_exceeded_time can be unsigned long long.
There was a problem hiding this comment.
yes, that's what i meant. and if it's just a long long, then we need to document the units.
src/server.c
Outdated
| server.stat_expire_cycle_time_used/1000, | ||
| server.stat_evictedkeys, | ||
| server.stat_eviction_exceeded_time + current_eviction_exceeded_time, | ||
| (unsigned long long) server.stat_total_eviction_exceeded_time + current_eviction_exceeded_time, |
There was a problem hiding this comment.
i think maybe we better convert both info fields to ms, or even seconds (not very useful in microseconds)
|
Naming looks good to me, I didn't get a chance to look through the code again yet. |
|
@huangzhw Want to go ahead and create the doc PR now? |
|
@madolson OK. I will do it. |
current_eviction_exceeded_time. See redis/redis#9031.
|
@oranagra As this one was merged, we have metrics in expire and eviction. Do we need samiliar metrics in defrag? |
|
@huangzhw like what? total time we were above the defrag thresholds and active defrag was "active"? |
|
@oranagra Yes. Exactly what I mean. I will try to do it. |
…ded_time (redis#9031) Add two INFO metrics: ``` total_eviction_exceeded_time:69734 current_eviction_exceeded_time:10230 ``` `current_eviction_exceeded_time` if greater than 0, means how much time current used memory is greater than `maxmemory`. And we are still over the maxmemory. If used memory is below `maxmemory`, this metric is reset to 0. `total_eviction_exceeded_time` means total time used memory is greater than `maxmemory` since server startup. The units of these two metrics are ms. Co-authored-by: Oran Agra <[email protected]>
Add two INFO metrics: ``` total_active_defrag_time:12345 current_active_defrag_time:456 ``` `current_active_defrag_time` if greater than 0, means how much time has passed since active defrag started running. If active defrag stops, this metric is reset to 0. `total_active_defrag_time` means total time the fragmentation was over the defrag threshold since the server started. This is a followup PR for #9031
|
do we want samiliar metrics for activerehashing? i wanted to look for metrics (something like expire_cycle_cpu_milliseconds) to see how much time we generally spend in activerehashing, but i didn’t see it. |
|
i don't these are comparable to the metrics in this PR. |
Add two INFO metrics:
current_eviction_exceeded_timeif greater than 0, means how much time current used memory is greater thanmaxmemory. And we are still over the maxmemory. If used memory is belowmaxmemory, this metric is reset to 0.total_eviction_exceeded_timemeans total time used memory is greater thanmaxmemorysince server startup.The units of these two metrics are ms.