Skip to content

Conversation

@addaleax
Copy link
Contributor

@addaleax addaleax commented May 25, 2018

Not sure about semverness here, so I went with PRing against master first. Happy to retarget to v1.x if that’s deemed okay as well. Also happy to throw this PR in the garbage bin if this approach doesn’t work for people.


If nthreads / 2 (rounded up) DNS calls are outstanding,
queue more work of that kind instead of letting it take over
more positions in the thread pool, blocking other work
such as the (usually much faster) file system I/O or
user-scheduled work.

Fixes: nodejs/node#8436

@addaleax
Copy link
Contributor Author

Oh, also: Not sure how to write a race-condition-free test for this.

@gireeshpunathil
Copy link
Contributor

LGTM

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments but I'm okay with the general idea.

void (*done)(struct uv__work *w, int status);
struct uv_loop_s* loop;
void* wq[2];
enum uv__work_kind kind;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an ABI change and therefore semver-major. You probably don't want to wait for that to show up in a release because libuv 2.0 is still some time out.

You should be able to drop this field if you move the slow_io_pooled_work >= (nthreads + 1) / 2 logic to worker(), have that function check both queues, and have post() simply put it in the right queue based on kind (and move the enum the src/uv-common.h, please.)

src/threadpool.c Outdated
static void post(QUEUE* q, enum uv__work_kind kind) {
uv_mutex_lock(&mutex);
if (kind == UV__WORK_SLOW_IO) {
if (slow_io_pooled_work >= (nthreads + 1) / 2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this could result in starvation (perennial preemption by fast work reqs) when nthreads == 1.

@gireeshpunathil
Copy link
Contributor

I would like to test the case of nodejs/node#8436 with this change, but looks difficult - as the current Node.js uses v1.x and this change is on top of master, and the gap seems to be wide. Any advice please, or else I will attempt to hand-pick the changes into the libuv present in the latest Node. thanks.

@gireeshpunathil
Copy link
Contributor

never mind the previous comment, I manually applied this change onto the latest node and tested the reproduce in nodejs/node#8436, and the result is wonderful!

without the change

2007: 0
2328: 10
2458: 20
2153: 30
2502: 40
2460: 50
9441: 60
2272: 70
2638: 80
6766: 90
10795: 100

with the change:

2006: 0
2003: 10
2000: 20
2002: 30
2003: 40
2002: 50
2002: 60
2004: 70
2003: 80
2004: 90
2006: 100

The first number indicates time gap between file I/O, and the second indicates the number of concurrent dns I/O that is being processed. The result clearly shows that the file I/O is immune to the dns look up.

If `nthreads / 2` (rounded up) DNS calls are outstanding,
queue more work of that kind instead of letting it take over
more positions in the thread pool, blocking other work
such as the (usually much faster) file system I/O or
user-scheduled work.

Fixes: nodejs/node#8436
@addaleax addaleax force-pushed the slow-fast-thread branch from 930213f to 0724df2 Compare June 9, 2018 19:33
@addaleax addaleax changed the base branch from master to v1.x June 9, 2018 19:33
@addaleax addaleax force-pushed the slow-fast-thread branch 2 times, most recently from 42e0989 to 16a9fe6 Compare June 9, 2018 19:51
@addaleax addaleax force-pushed the slow-fast-thread branch from 16a9fe6 to aa5506a Compare June 9, 2018 19:53
@addaleax
Copy link
Contributor Author

addaleax commented Jun 9, 2018

@gireeshpunathil @bnoordhuis Updated the PR for 1.x with no public ABI changes, along the lines of Ben’s suggestion. Could you take another look?

I’ve tried to avoid adding even more complexity here; As a result, this does have a slightly different scheduling behaviour (multiple “slow” tasks always start executing a full queue length apart from each other). I think that’s okay, though.

@gireeshpunathil
Copy link
Contributor

LGTM

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static void worker(void* arg) {
struct uv__work* w;
QUEUE* q;
int is_slow_work;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only proper name for this variable is is_slooow_work.

I'll get my coat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too tempting :)

QUEUE_REMOVE(q);
QUEUE_INIT(q); /* Signal uv_cancel() that the work req is executing. */

if (q == &run_slow_work_message) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For legibility, I'd is_slow_work = 0; before the if and drop the else. It's below the fold, so to speak; easy to miss.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

src/threadpool.c Outdated
static QUEUE run_slow_work_message;
static QUEUE slow_io_pending_wq;

#define SLOW_WORK_THREAD_THRESHOLD ((nthreads + 1) / 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this a function. The fact that it's a macro hides that it's not a constant but depends on a global.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM but this should probably get sign-off from at least one more maintainer since it's a pretty subtle change.

@bnoordhuis
Copy link
Member

@libuv/collaborators Ping. Can at least one additional maintainer give this a quick lookover?

Copy link
Member

@santigimeno santigimeno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple of questions


enum uv__work_kind {
UV__WORK_CPU,
UV__WORK_FAST_IO,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between UV__WORK_CPU and UV__WORK_FAST_IO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing, at this point

QUEUE_INSERT_TAIL(&wq, &run_slow_work_message);
if (idle_threads > 0)
uv_cond_signal(&cond);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean if we have a scenario with works scheduled in this order:

slow_work_1
slow_work_2
fast work_1
fast_work_2
fast_work_3

the slow_work_2 would be scheduled after the fast works even though the threshold for slow works was 2? If so, was it intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was born out of necessity – an earlier version of this patch treated things differently, but as far as I can tell it’s hard to do this another way without changing the uv_work_t layout to include a uv__work_kind field

@santigimeno
Copy link
Member

@addaleax
Copy link
Contributor Author

Hm, gentle ping? CI results are inaccessible by now, so I guess this needs a new one … :)

@gireeshpunathil
Copy link
Contributor

requesting this to be taken forward as there is no outstanding concerns.

@santigimeno
Copy link
Member

Sorry for the delay, I was waiting for someone else to comment and forgot about it.
CI: https://ci.nodejs.org/view/libuv/job/libuv-test-commit/990/

One comment:
@addaleax, would it make sense, later on, having a different PR for master with the uv__work_kind field in uv_work_t as you had originally?

@santigimeno
Copy link
Member

@addaleax
Copy link
Contributor Author

@santigimeno

would it make sense, later on, having a different PR for master with the uv__work_kind field in uv_work_t as you had originally?

I think so, yes. It’s not important, though.

santigimeno pushed a commit that referenced this pull request Aug 21, 2018
If `nthreads / 2` (rounded up) DNS calls are outstanding,
queue more work of that kind instead of letting it take over
more positions in the thread pool, blocking other work
such as the (usually much faster) file system I/O or
user-scheduled work.

Fixes: nodejs/node#8436
PR-URL: #1845
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Santiago Gimeno <[email protected]>
@santigimeno
Copy link
Member

Landed in 90891b4. Thanks!

@addaleax addaleax deleted the slow-fast-thread branch August 27, 2018 21:02
davisjam added a commit to davisjam/libuv that referenced this pull request Aug 28, 2018
This commit implements the "pluggable threadpool" design.

This commit reverts libuv#1845 because it encodes a policy decision.
This decision is up to the embedders now.

The implementation is Linux-only at the moment.
Still have to change:
 - win/fs.c
 - win/getaddrinfo.c
 - win/getnameinfo.c
davisjam added a commit to davisjam/libuv that referenced this pull request Sep 30, 2018
This commit implements the "pluggable threadpool" design.

This commit reverts libuv#1845 because it encodes a policy decision.
This decision is up to the embedders now.

The implementation is Linux-only at the moment.
Still have to change:
 - win/fs.c
 - win/getaddrinfo.c
 - win/getnameinfo.c
addaleax added a commit to addaleax/libuv that referenced this pull request Oct 4, 2018
90891b4 introduced a race
condition when accessing `slow_io_work_running` – it is being
increased and later decreased as part of the worker thread loop,
but was accessed with different mutexes during these operations.

This fixes the race condition by making sure both accesses
are protected through the global `mutex` of `threadpool.c`.

This fixes a number of flaky Node.js tests.

Refs: libuv#1845
Refs: nodejs/reliability#18
Refs: nodejs/node#23089
Refs: nodejs/node#23067
Refs: nodejs/node#23066
Refs: nodejs/node#23219
addaleax added a commit to addaleax/node that referenced this pull request Oct 4, 2018
90891b4232e91dbd7a2e2077e4d23d16a374b41d introduced a race
condition when accessing `slow_io_work_running` – it is being
increased and later decreased as part of the worker thread loop,
but was accessed with different mutexes during these operations.

This fixes the race condition by making sure both accesses
are protected through the global `mutex` of `threadpool.c`.

This fixes a number of flaky Node.js tests.

Refs: libuv/libuv#1845
Refs: nodejs/reliability#18
Refs: nodejs#23089
Refs: nodejs#23067
Refs: nodejs#23066
Refs: nodejs#23219
addaleax added a commit that referenced this pull request Oct 7, 2018
90891b4 introduced a race
condition when accessing `slow_io_work_running` – it is being
increased and later decreased as part of the worker thread loop,
but was accessed with different mutexes during these operations.

This fixes the race condition by making sure both accesses
are protected through the global `mutex` of `threadpool.c`.

This fixes a number of flaky Node.js tests.

Refs: #1845
Refs: nodejs/reliability#18
Refs: nodejs/node#23089
Refs: nodejs/node#23067
Refs: nodejs/node#23066
Refs: nodejs/node#23219
PR-URL: #2021
Reviewed-By: Santiago Gimeno <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
gireeshpunathil added a commit to gireeshpunathil/node that referenced this pull request Dec 30, 2018
Validate that massive dns lookups do not block filesytem I/O
(or any fast I/O for that matter).
Prior to libuv/libuv#1845 few back-to-back dns
lookup were sufficient to engage libuv threadpool workers in a blocking
manner, throttling other work items that need the pool. this test acts
as a regression test for the same.

Start slow and fast I/Os together, and make sure fast I/O can complete
in at least in 1/100th of time for slow I/O.

Refs: libuv/libuv#1845
Refs: nodejs#8436

PR-URL: nodejs#23099
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
targos pushed a commit to nodejs/node that referenced this pull request Jan 1, 2019
Validate that massive dns lookups do not block filesytem I/O
(or any fast I/O for that matter).
Prior to libuv/libuv#1845 few back-to-back dns
lookup were sufficient to engage libuv threadpool workers in a blocking
manner, throttling other work items that need the pool. this test acts
as a regression test for the same.

Start slow and fast I/Os together, and make sure fast I/O can complete
in at least in 1/100th of time for slow I/O.

Refs: libuv/libuv#1845
Refs: #8436

PR-URL: #23099
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
refack pushed a commit to refack/node that referenced this pull request Jan 14, 2019
Validate that massive dns lookups do not block filesytem I/O
(or any fast I/O for that matter).
Prior to libuv/libuv#1845 few back-to-back dns
lookup were sufficient to engage libuv threadpool workers in a blocking
manner, throttling other work items that need the pool. this test acts
as a regression test for the same.

Start slow and fast I/Os together, and make sure fast I/O can complete
in at least in 1/100th of time for slow I/O.

Refs: libuv/libuv#1845
Refs: nodejs#8436

PR-URL: nodejs#23099
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
davisjam added a commit to davisjam/libuv that referenced this pull request Mar 12, 2019
This commit implements the "pluggable threadpool" design.

This commit reverts libuv#1845 because it encodes a policy decision.
This decision is up to the embedders now.

The implementation is Linux-only at the moment.
Still have to change:
 - win/fs.c
 - win/getaddrinfo.c
 - win/getnameinfo.c
BethGriggs pushed a commit to nodejs/node that referenced this pull request Apr 17, 2019
Validate that massive dns lookups do not block filesytem I/O
(or any fast I/O for that matter).
Prior to libuv/libuv#1845 few back-to-back dns
lookup were sufficient to engage libuv threadpool workers in a blocking
manner, throttling other work items that need the pool. this test acts
as a regression test for the same.

Start slow and fast I/Os together, and make sure fast I/O can complete
in at least in 1/100th of time for slow I/O.

Refs: libuv/libuv#1845
Refs: #8436

PR-URL: #23099
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
BethGriggs pushed a commit to nodejs/node that referenced this pull request Apr 28, 2019
Validate that massive dns lookups do not block filesytem I/O
(or any fast I/O for that matter).
Prior to libuv/libuv#1845 few back-to-back dns
lookup were sufficient to engage libuv threadpool workers in a blocking
manner, throttling other work items that need the pool. this test acts
as a regression test for the same.

Start slow and fast I/Os together, and make sure fast I/O can complete
in at least in 1/100th of time for slow I/O.

Refs: libuv/libuv#1845
Refs: #8436

PR-URL: #23099
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
BethGriggs pushed a commit to nodejs/node that referenced this pull request May 10, 2019
Validate that massive dns lookups do not block filesytem I/O
(or any fast I/O for that matter).
Prior to libuv/libuv#1845 few back-to-back dns
lookup were sufficient to engage libuv threadpool workers in a blocking
manner, throttling other work items that need the pool. this test acts
as a regression test for the same.

Start slow and fast I/Os together, and make sure fast I/O can complete
in at least in 1/100th of time for slow I/O.

Refs: libuv/libuv#1845
Refs: #8436

PR-URL: #23099
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
MylesBorins pushed a commit to nodejs/node that referenced this pull request May 16, 2019
Validate that massive dns lookups do not block filesytem I/O
(or any fast I/O for that matter).
Prior to libuv/libuv#1845 few back-to-back dns
lookup were sufficient to engage libuv threadpool workers in a blocking
manner, throttling other work items that need the pool. this test acts
as a regression test for the same.

Start slow and fast I/Os together, and make sure fast I/O can complete
in at least in 1/100th of time for slow I/O.

Refs: libuv/libuv#1845
Refs: #8436

PR-URL: #23099
Reviewed-By: Sakthipriyan Vairamani <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants