Skip to content

Add API for measuring event loop idle time#2725

Merged
vtjnash merged 3 commits intolibuv:v1.xfrom
trevnorris:time-in-provider
Aug 4, 2020
Merged

Add API for measuring event loop idle time#2725
vtjnash merged 3 commits intolibuv:v1.xfrom
trevnorris:time-in-provider

Conversation

@trevnorris
Copy link
Copy Markdown
Member

@trevnorris trevnorris commented Mar 3, 2020

This PR contains code that I've been working on for a while as part of some research I've been doing at work, and decided it would be good to submit this for review after seeing @indutny's Node.js PR nodejs/node#32018.

The main reasons for writing this patch are:

  • Have very accurate measurements of idle time spent in the kernel's event provider.
  • Allow retrieval of the idle time from another thread to allow more accurate time-series data collection (instead of only collecting the data when the event loop can process a callback).

A lot is explain in the commit message below, and I'll be happy to explain in detail why the implementation was done this way. For example, this approach was a conclusion I came to last year after researching whether it was possible to infer the state of the application based solely on event loop metrics. In order to achieve that goal it's necessary to have the level of accuracy this patch provides.

There are a few things I'm unsure about and would like feedback on, like how I've implemented atomics, and I'll need some assistance with the Windows implementation.

/cc @bnoordhuis @indutny @saghul

Commit message:

Add API for measuring event loop idle time

The API addition uv_metrics_idle_time() is a thread safe call that
allows the user to retrieve the amount of time the event loop has spent
in the kernel's event provider (i.e. poll). It was done this way to
allow retrieving this value without needing to interrupt the execution
of the event loop. This option can be enabled by passing
UV_METRICS_IDLE_TIME to uv_loop_configure().

One important aspect of this change is, when enabled, to always first
call the event provider with a timeout == 0. This allows libuv to know
whether any events were waiting in the event queue when the event
provider was called. The importance of this is because libuv is tracking
the amount of "idle time", not "poll time". Thus the provider entry time
is not recorded when timeout == 0 (the event provider never idles in
this case).

While this does add a small amount of overhead, when enabled, but the
overhead decreases when the event loop has a heavier load. This is
because poll events will be waiting when the event provider is called.
Thus never actually recording the provider entry time.

Checking if uv_loop_t is configured with UV_METRICS_IDLE_TIME always
happens in uv__metrics_set_provider_entry_time() and
uv__metrics_update_idle_time(). Making the conditional logic wrapping
each call simpler and allows for instrumentation to always hook into
those two function calls.

Add struct uv__metrics_loop_t as a location for future metrics API
additions.

Add struct uv__loop_internal_fields_t as a location for future
additions to uv_loop_t while also maintaining v1.x compatibility. A
pointer to this struct has been placed in the uv_loop_t::unused[1]
space to allow for fast retrieval and maintaining ABI compatibility with
v1.x.

Currently uv__loop_internal_fields_t only contains both the
uv__metrics_loop_t and flags field. The reason for adding the
flags field is because the same field was never added to
UV_LOOP_PRIVATE_FIELDS in Windows as it was in Unix, and adding that
field would break ABI.

The idea for creating a struct and attaching it to uv_loop_t for
future API enhancements was taken from a comment by bnoordhuis in:
#2506 (comment)

Tests and additional documentation has been included.

@trevnorris trevnorris force-pushed the time-in-provider branch 5 times, most recently from 00bf674 to 792e760 Compare March 3, 2020 22:23
Copy link
Copy Markdown
Member

@santigimeno santigimeno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the test file should be added to CMakeLists.txt

Comment thread src/unix/aix.c Outdated
Comment thread src/unix/kqueue.c Outdated
Comment thread src/unix/linux-core.c Outdated
Comment thread src/unix/loop.c Outdated
Comment thread src/unix/core.c Outdated
@trevnorris trevnorris force-pushed the time-in-provider branch 2 times, most recently from 7e829fe to f19c78f Compare March 3, 2020 22:44
@saghul
Copy link
Copy Markdown
Member

saghul commented Mar 4, 2020

The performance impact can probably be mitigated if the machinery is disabled by default and it's only initialized with a call to uv_loop_configure, WDYT?

As for the general approach, I'm not an expert, but arent tools like perf and eBPF capable of gathering this insight, without instrumenting the application?

Regarding the specific metric introduced: in what context is it useful? The way libuv operates it will have 3 types of values:

  • 0: can't block, there are pending callbacks to be called.
  • -1: block indefinitely
  • N: closest timer

What kind of application would be interested in these?

@trevnorris
Copy link
Copy Markdown
Member Author

@saghul

The performance impact can probably be mitigated if the machinery is disabled by default and it's only initialized with a call to uv_loop_configure, WDYT?

That sounds like a reasonable solution.

As for the general approach, I'm not an expert, but arent tools like perf and eBPF capable of gathering this insight, without instrumenting the application?

This is meant to be used in a production environment where the data can be collected at mass scale. The overhead is low enough that it has no noticeable impact on a (Node.js) process. It also serves as a single interface that is platform agnostic.

Regarding the specific metric introduced: in what context is it useful?

Allow me to dive into some of my research. Last year I began to investigate:

  1. How to scale Node.js applications, now that the classical scaling rules (largely based on a process's CPU usage) no longer apply in a world of Worker threads.
  2. How to understand the overall state of a process based on metrics only collected from the event loop.

Note: From here on the percentage of idle time spent in the event loop's event provider will simply be referred to as the "event loop utilization" or ELU.

To better understand the problem scope I began running simulations of (Node.js) HTTP server request/response characteristics based on different workloads, and did comparisons of the number of requests each workload could support. Then I did a comparison of the request data to the process's CPU usage and the ELU. After doing a lot of analysis on several dozen types of workloads I found a correlation between these three data points. This research, and the analysis presented below on the ELU, come from the patch in this PR.

The following two graphs represent two different server workloads. The X-axis is the number of connections to the server. The right Y-axis is the number of responses (blue line). The left Y-axis is the CPU% (yellow line) and the ELU (red line).

Notice that the graph shows "req/period" and not "req/second". This is because for a complex applications the number of requests per second isn't as reliable a metric. So some measurements are the number of requests per 5 or 10 seconds. Simply using "req/period" is a generic way of labeling this.

req_per_period_chart1 req_per_period_chart2

The differences in the response curves based on the workload type is a discussion on its own, but that's for another time.

Instead we'll focus on the fact that in these two cases the ELU almost perfectly matches the CPU. This is because in both of these workloads V8 is able to GC immediately, instead of needing to do any off-thread or idle-time cleanup. Also notice that both CPU and ELU closely match the response curve. And both curves closely match the req/period curve. I found this to generally be true, except when doing things like HTTP request pipelining.

Now for the next example:

req_per_period_chart3

This use case causes the GC to be constantly doing cleanup off-thread and during idle-time. The CPU usage maxes out at around 125%. If we were to scale the application soley based on CPU usage (which is how most Node.js applications are scaled), let's say at 80%, then we could be scaling to early. At 80% CPU usage ELU% is only at 61% and the server is serving 369 req/period, of the maximum 710 req/period that it's capable of. This translates to scaling the application when it's only at ~51% of its actual maximum capacity.

Note that in this graph it shows the total req/period increasing slightly after the ELU has reached 100%. This is because of V8 GC and code optimization specifics.

To recap the above, the ELU has the following advantages over the usual CPU usage:

  1. Its range is only between [0, 1]. Making it simpler to set application scaling rules.
  2. It only reflects the amount of time the event loop is busy. Thus giving a more accurate indicator of how much additional work the process is capable of handling.
  3. It is a cross-platform way to monitor every thread's event loop. This is especially relevent to Node's Worker threads. Node processes in the future could easily be running at 300% CPU usage and still be completely healthy.

The way libuv operates it will have 3 types of values:

Sorry, I'm not sure how the three timeout values are applicable.

@trevnorris trevnorris requested a review from santigimeno March 4, 2020 20:17
@trevnorris
Copy link
Copy Markdown
Member Author

@saghul I've added the loop configuration option UV_LOOP_IDLE_TIME. Also added it to the Windows side as well since support is planned before this PR lands.

Comment thread test/test-metrics.c Outdated
Comment thread src/unix/loop.c Outdated
Comment thread test/test-metrics.c Outdated
Comment thread test/test-metrics.c Outdated
Comment thread test/test-metrics.c Outdated
Comment thread test/test-metrics.c Outdated
Comment thread src/unix/core.c Outdated
Comment thread src/unix/core.c Outdated
@saghul
Copy link
Copy Markdown
Member

saghul commented Mar 5, 2020

@trevnorris Thanks for the thorough explanation, I think I get what you're after here. 👍

The way libuv operates it will have 3 types of values:

Sorry, I'm not sure how the three timeout values are applicable.

Disregard that, I'm nott sure where I was going there... 🙈

Comment thread src/unix/core.c Outdated
Comment thread src/unix/aix.c
@mmarchini
Copy link
Copy Markdown

As for the general approach, I'm not an expert, but arent tools like perf and eBPF capable of gathering this insight, without instrumenting the application?

While this is true, the application/libraries still need some level of instrumentation for eBPF to be used properly. Inlined functions, lack of debug symbols and frame pointers omitted usually hurt eBPF observability. Also, BPF is only available on Linux, and is complicated to use on FaaS or containerized platforms.

A while ago I suggested re-adding DTrace-style tracepoints to libuv (#2209) to make the event loop more observable. But I like this approach better, as it doesn't require users to learn and setup a system-level observability tool and can be used regardless of underlying stack or operating system. And for users who need those system level tools, this approach improves the information they can get, becaue now it's possible to hook those tools to uv__metrics_update_idle_time.

Comment thread src/uv-common.h Outdated
@trevnorris
Copy link
Copy Markdown
Member Author

Review suggestions have been addressed.

Is there anyone willing to give tips on the Windows implementation? The first question I need to answer is if GetQueuedCompletionStatusEx (used in uv_poll) is similar to all unix implementations. In that GetQueuedCompletionStatusEx returns when there are events to be processed, but it is still libuv's job to process those events. Or does GetQueuedCompletionStatusEx process the events for the user then return? I can't tell when the user's callback is supposed to be called.

The other main thing is to see if there are platform-agnostic atomics. I can switch to using a mutex lock if necessary, but I'd like to prevent that overhead if possible. If anything guess I could #ifdef and only use atomics on supported platforms and use a mutex on the other. @bnoordhuis have any suggestions here?

@santigimeno
Copy link
Copy Markdown
Member

Is there anyone willing to give tips on the Windows implementation? The first question I need to answer is if GetQueuedCompletionStatusEx (used in uv_poll) is similar to all unix implementations. In that GetQueuedCompletionStatusEx returns when there are events to be processed, but it is still libuv's job to process those events. Or does GetQueuedCompletionStatusEx process the events for the user then return? I can't tell when the user's callback is supposed to be called.

/cc @libuv/windows

@bzoz
Copy link
Copy Markdown
Member

bzoz commented Mar 10, 2020

The user callbacks are called by ran_pending = uv_process_reqs(loop);.

@trevnorris
Copy link
Copy Markdown
Member Author

trevnorris commented Mar 10, 2020

Added a commit that will switch to using a uv_mutex_t if the GNU C atomic extension isn't available. Don't love it, but until I can implement a better solution I'll stick with this.

@trevnorris
Copy link
Copy Markdown
Member Author

trevnorris commented Mar 10, 2020

Did initial Windows support (much thanks @bzoz for pointing that out). Still need to do some testing. (missed that a DWORD is unsigned. need to rethink that implementation detail)

Also, the implementation of the internal metrics functions is almost identical. Except for the atomic operations. I'm going to benchmark the performance difference between using the atomics and the mutex. Depending on the difference, I might just drop the atomics part completely and unify the implementations.

@trevnorris trevnorris force-pushed the time-in-provider branch 2 times, most recently from 851880c to bcf3900 Compare March 11, 2020 20:57
@trevnorris trevnorris changed the title WIP: Add API for measuring event loop idle time Add API for measuring event loop idle time Mar 11, 2020
@trevnorris
Copy link
Copy Markdown
Member Author

Bummer. Same failure. I'm now managing to reproduce it on my Windows VM. I'm confused why this is happening on a spawned thread. Even when I only spawn a single thread, and use the noop timer callback, it still shows an extremely low idle time. Will give this another look tomorrow.

@trevnorris trevnorris force-pushed the time-in-provider branch 2 times, most recently from d2f871a to c785eb7 Compare July 29, 2020 19:16
@trevnorris
Copy link
Copy Markdown
Member Author

@vtjnash @santigimeno I've updated and cleaned up the PR. Tests should be passing on Windows. The bug came from me overlooking this comment in uv__poll():

      /* GetQueuedCompletionStatus can occasionally return a little early.
       * Make sure that the desired timeout target time is reached.
       */

The result was that the internal tracking mechanism for when poll was entered was being unexpectedly reset. So placement of uv__metrics_update_idle_time() has been changed to accommodate that (along with a code comment explaining for the reasoning behind the change).

Should be ready for CI.

@vtjnash
Copy link
Copy Markdown
Member

vtjnash commented Jul 29, 2020

@trevnorris
Copy link
Copy Markdown
Member Author

I forgot to remove the struct mthread_t from test/test-metrics.c after rewriting the tests. So just removed that and rebased it on latest v1.x. No other changes.

This is a port of 70002c8 to z/OS to fix the same potential issue that
could effectively enter an infinite loop (but not a busy loop) under
certain conditions when polling for events.

PR-URL: libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
Add struct `uv__loop_internal_fields_t` as a location for future
additions to `uv_loop_t` while also maintaining v1.x compatibility.

Currently `uv__loop_internal_fields_t` only contains the `flags` field.
The reason for adding the `flags` field is because the same field was
never added to `UV_LOOP_PRIVATE_FIELDS` in Windows as it was in Unix.

The idea for creating a struct and attaching it to `uv_loop_t` for
future API enhancements was taken from a comment by bnoordhuis in:
libuv#2506 (comment)

Also add `internal_fields` to `uv_loop_t` to store the pointer to
`uv__loop_internal_fields_t`. This naming makes more sense than just
using `active_reqs.unused[1]`. To maintain ABI compatibility, shrink the
`unused` array.

PR-URL: libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
The API addition `uv_metrics_idle_time()` is a thread safe call that
allows the user to retrieve the amount of time the event loop has spent
in the kernel's event provider (i.e. poll). It was done this way to
allow retrieving this value without needing to interrupt the execution
of the event loop. This option can be enabled by passing
`UV_METRICS_IDLE_TIME` to `uv_loop_configure()`.

One important aspect of this change is, when enabled, to always first
call the event provider with a `timeout == 0`. This allows libuv to know
whether any events were waiting in the event queue when the event
provider was called. The importance of this is because libuv is tracking
the amount of "idle time", not "poll time". Thus the provider entry time
is not recorded when `timeout == 0` (the event provider never idles in
this case).

While this does add a small amount of overhead, when enabled, but the
overhead decreases when the event loop has a heavier load. This is
because poll events will be waiting when the event provider is called.
Thus never actually recording the provider entry time.

Checking if `uv_loop_t` is configured with `UV_METRICS_IDLE_TIME` always
happens in `uv__metrics_set_provider_entry_time()` and
`uv__metrics_update_idle_time()`. Making the conditional logic wrapping
each call simpler and allows for instrumentation to always hook into
those two function calls.

Rather than placing the fields directly on `uv__loop_internal_fields_t`
add the struct `uv__loop_metrics_t` as a location for future metrics API
additions.

Tests and additional documentation has been included.

PR-URL: libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
@vtjnash
Copy link
Copy Markdown
Member

vtjnash commented Aug 4, 2020

@vtjnash vtjnash merged commit e8effd4 into libuv:v1.x Aug 4, 2020
@trevnorris
Copy link
Copy Markdown
Member Author

@vtjnash Thanks a lot for helping this make it through.

@trevnorris trevnorris deleted the time-in-provider branch August 4, 2020 20:15
cjihrig pushed a commit that referenced this pull request Sep 3, 2020
Refs: #2725
PR-URL: #2984
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
@addaleax
Copy link
Copy Markdown
Contributor

addaleax commented Nov 4, 2020

The documentation here should probably mention the unit of the returned value as well

JeffroMF pushed a commit to JeffroMF/libuv that referenced this pull request May 16, 2022
This is a port of 70002c8 to z/OS to fix the same potential issue that
could effectively enter an infinite loop (but not a busy loop) under
certain conditions when polling for events.

PR-URL: libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
JeffroMF pushed a commit to JeffroMF/libuv that referenced this pull request May 16, 2022
Add struct `uv__loop_internal_fields_t` as a location for future
additions to `uv_loop_t` while also maintaining v1.x compatibility.

Currently `uv__loop_internal_fields_t` only contains the `flags` field.
The reason for adding the `flags` field is because the same field was
never added to `UV_LOOP_PRIVATE_FIELDS` in Windows as it was in Unix.

The idea for creating a struct and attaching it to `uv_loop_t` for
future API enhancements was taken from a comment by bnoordhuis in:
libuv#2506 (comment)

Also add `internal_fields` to `uv_loop_t` to store the pointer to
`uv__loop_internal_fields_t`. This naming makes more sense than just
using `active_reqs.unused[1]`. To maintain ABI compatibility, shrink the
`unused` array.

PR-URL: libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
JeffroMF pushed a commit to JeffroMF/libuv that referenced this pull request May 16, 2022
The API addition `uv_metrics_idle_time()` is a thread safe call that
allows the user to retrieve the amount of time the event loop has spent
in the kernel's event provider (i.e. poll). It was done this way to
allow retrieving this value without needing to interrupt the execution
of the event loop. This option can be enabled by passing
`UV_METRICS_IDLE_TIME` to `uv_loop_configure()`.

One important aspect of this change is, when enabled, to always first
call the event provider with a `timeout == 0`. This allows libuv to know
whether any events were waiting in the event queue when the event
provider was called. The importance of this is because libuv is tracking
the amount of "idle time", not "poll time". Thus the provider entry time
is not recorded when `timeout == 0` (the event provider never idles in
this case).

While this does add a small amount of overhead, when enabled, but the
overhead decreases when the event loop has a heavier load. This is
because poll events will be waiting when the event provider is called.
Thus never actually recording the provider entry time.

Checking if `uv_loop_t` is configured with `UV_METRICS_IDLE_TIME` always
happens in `uv__metrics_set_provider_entry_time()` and
`uv__metrics_update_idle_time()`. Making the conditional logic wrapping
each call simpler and allows for instrumentation to always hook into
those two function calls.

Rather than placing the fields directly on `uv__loop_internal_fields_t`
add the struct `uv__loop_metrics_t` as a location for future metrics API
additions.

Tests and additional documentation has been included.

PR-URL: libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
JeffroMF pushed a commit to JeffroMF/libuv that referenced this pull request May 16, 2022
Refs: libuv#2725
PR-URL: libuv#2984
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
This is a port of 70002c8 to z/OS to fix the same potential issue that
could effectively enter an infinite loop (but not a busy loop) under
certain conditions when polling for events.

PR-URL: libuv/libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
Add struct `uv__loop_internal_fields_t` as a location for future
additions to `uv_loop_t` while also maintaining v1.x compatibility.

Currently `uv__loop_internal_fields_t` only contains the `flags` field.
The reason for adding the `flags` field is because the same field was
never added to `UV_LOOP_PRIVATE_FIELDS` in Windows as it was in Unix.

The idea for creating a struct and attaching it to `uv_loop_t` for
future API enhancements was taken from a comment by bnoordhuis in:
libuv/libuv#2506 (comment)

Also add `internal_fields` to `uv_loop_t` to store the pointer to
`uv__loop_internal_fields_t`. This naming makes more sense than just
using `active_reqs.unused[1]`. To maintain ABI compatibility, shrink the
`unused` array.

PR-URL: libuv/libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
The API addition `uv_metrics_idle_time()` is a thread safe call that
allows the user to retrieve the amount of time the event loop has spent
in the kernel's event provider (i.e. poll). It was done this way to
allow retrieving this value without needing to interrupt the execution
of the event loop. This option can be enabled by passing
`UV_METRICS_IDLE_TIME` to `uv_loop_configure()`.

One important aspect of this change is, when enabled, to always first
call the event provider with a `timeout == 0`. This allows libuv to know
whether any events were waiting in the event queue when the event
provider was called. The importance of this is because libuv is tracking
the amount of "idle time", not "poll time". Thus the provider entry time
is not recorded when `timeout == 0` (the event provider never idles in
this case).

While this does add a small amount of overhead, when enabled, but the
overhead decreases when the event loop has a heavier load. This is
because poll events will be waiting when the event provider is called.
Thus never actually recording the provider entry time.

Checking if `uv_loop_t` is configured with `UV_METRICS_IDLE_TIME` always
happens in `uv__metrics_set_provider_entry_time()` and
`uv__metrics_update_idle_time()`. Making the conditional logic wrapping
each call simpler and allows for instrumentation to always hook into
those two function calls.

Rather than placing the fields directly on `uv__loop_internal_fields_t`
add the struct `uv__loop_metrics_t` as a location for future metrics API
additions.

Tests and additional documentation has been included.

PR-URL: libuv/libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Jul 23, 2025
liujinye-sys pushed a commit to open-vela/apps_system_libuv that referenced this pull request Dec 16, 2025
This is a port of 70002c8 to z/OS to fix the same potential issue that
could effectively enter an infinite loop (but not a busy loop) under
certain conditions when polling for events.

PR-URL: libuv/libuv#2725
Reviewed-By: Fedor Indutny <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Jameson Nash <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.