Skip to content

fix(ext/flash): graceful server startup/shutdown with unsettled promises in mind#16616

Merged
bartlomieju merged 30 commits intodenoland:mainfrom
magurotuna:segv-flash
Nov 24, 2022
Merged

fix(ext/flash): graceful server startup/shutdown with unsettled promises in mind#16616
bartlomieju merged 30 commits intodenoland:mainfrom
magurotuna:segv-flash

Conversation

@magurotuna
Copy link
Copy Markdown
Member

@magurotuna magurotuna commented Nov 13, 2022

This PR resets the revert commit made by #16610, bringing back #16383 which attempts to fix the issue happening when we use the flash server with --watch option enabled.
Also, some code changes are made to pass the regression test added in #16610. These changes can be seen here: https://github.com/denoland/deno/compare/0f31256..99f9670

Closes #16699

Comment on lines +1192 to +1201
// Now the flash thread is about to finish, but there may be some unsettled
// promises in the main thread that will use the socket. To make the socket
// alive longer enough, we move its ownership to the main thread.
for (tok, socket) in sockets {
if let Some(sender) = socket_senders.remove(&tok) {
// Do nothing if the receiver has already been dropped.
_ = sender.send(socket);
}
}

Copy link
Copy Markdown
Member Author

@magurotuna magurotuna Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key idea; to prevent sockets from being released when the flash server finishes, transmit them to each ServerContext. This ensures that unsettled promises that outlive the flash thread can use sockets later.

pub parse_done: ParseStatus,
pub buffer: UnsafeCell<Vec<u8>>,
pub read_lock: Arc<Mutex<()>>,
pub _pinned: PhantomPinned,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not directly related to this PR, but I feel this field is a must so we can make sure that Stream is not moved when it's Pinned.

Comment on lines +63 to +74
let port: number;

const server = Deno.serve({
handler: (_req) => new Response("ok"),
hostname: "localhost",
port: 4501,
signal: ac.signal,
onListen: onListen(listeningPromise),
onError: createOnErrorCb(ac),
port: 0,
signal: ac1.signal,
onListen: (addr) => {
port = addr.port;
listeningPromise.resolve();
},
onError: createOnErrorCb(ac1),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using port number 4501 makes CI really flaky. On my local ubuntu machine I was not able to see this flakiness though.

@magurotuna magurotuna marked this pull request as ready for review November 14, 2022 10:48
@magurotuna magurotuna requested review from kt3k and littledivy and removed request for littledivy November 14, 2022 10:48
kt3k
kt3k previously approved these changes Nov 14, 2022
Copy link
Copy Markdown
Contributor

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this change doesn't cause the issue denoland/std#2882 anymore

LGTM

@magurotuna magurotuna changed the title fix(flash): wait for a response to be served before terminating the server fix(ext/flash): graceful server startup/shutdown again Nov 14, 2022
@magurotuna magurotuna changed the title fix(ext/flash): graceful server startup/shutdown again fix(ext/flash): graceful server startup/shutdown with unsettled promises in mind Nov 14, 2022
@kt3k kt3k dismissed their stale review November 15, 2022 05:22

stale

@kt3k
Copy link
Copy Markdown
Contributor

kt3k commented Nov 15, 2022

I found some test failures with net-related compat test cases.

If I run the following command in std, there seem many test failures:

path/to/target/debug/deno task node:test -- parallel/test-net

Look like this has a different problem? 🤔

Most failures show this error message:

error: AssertionError: Failed assertion: error: Uncaught BadResource: Bad resource ID
      this[kStreamBaseField]?.close();
                              ^
    at Object.close (deno:core/01_core.js:367:25)

@magurotuna
Copy link
Copy Markdown
Member Author

ugh... Will dig into that. Minimum reproducible code would be appreciated.

@magurotuna
Copy link
Copy Markdown
Member Author

What I noticed so far is the close op is called twice with the same resource ID. Still no idea the reason.

@magurotuna
Copy link
Copy Markdown
Member Author

Super interesting, weird fact: if we build deno with --release option, the command path/to/target/release/deno task node:test -- parallel/test-net does SUCCEED. I confirmed this with several commits, including d8aafcc which is the latest commit in this branch. However, if we debug build, even 916598f (which is v1.28.0) does not get these tests to pass. I have no idea at all why this weird thing is happening.

@magurotuna
Copy link
Copy Markdown
Member Author

Really flaky 🤷

@bartlomieju
Copy link
Copy Markdown
Member

Yeah, I'm not able to repeat this flakiness locally either :/

@magurotuna
Copy link
Copy Markdown
Member Author

magurotuna commented Nov 23, 2022

Hmm, from my quick observation, there seems to be three patterns of test failures.

  1. test release macos-12 timeout
  2. bench release ubuntu-20.04-xl panic
  3. test release(debug) ubuntu-20.04-xl fails at httpServerRejectsOnAddrInUse in flash_test.ts

Among these, only 3 is specific to this PR - other 2 failures happen on the main branch too.

@magurotuna
Copy link
Copy Markdown
Member Author

d7ca7eb fell into the case 1 (macos timeout)

@kt3k
Copy link
Copy Markdown
Contributor

kt3k commented Nov 23, 2022

Super interesting, weird fact: if we build deno with --release option, the command path/to/target/release/deno task node:test -- parallel/test-net does SUCCEED. I confirmed this with several commits, including d8aafcc which is the latest commit in this branch. However, if we debug build, even 916598f (which is v1.28.0) does not get these tests to pass. I have no idea at all why this weird thing is happening.

I also confirmed deno task node:test -- parallel/test-net doesn't have problem with --release build. I also found that debug build already have problem with these net related compat test cases in main. So those failures look irrelevant to this change (sorry for wasting your time!)

Copy link
Copy Markdown
Contributor

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@magurotuna
Copy link
Copy Markdown
Member Author

67aac4a fell into the case 3

@magurotuna
Copy link
Copy Markdown
Member Author

@bartlomieju What do you think about the flakiness we are facing? I ran this failing test case locally line 10 times and saw no failure. Would it be okay to disable this test for the time being?

@bartlomieju
Copy link
Copy Markdown
Member

@bartlomieju What do you think about the flakiness we are facing? I ran this failing test case locally line 10 times and saw no failure. Would it be okay to disable this test for the time being?

Yes, I think that's reasonable

@magurotuna
Copy link
Copy Markdown
Member Author

@bartlomieju PTAL before actually getting it landed just in case.

Copy link
Copy Markdown
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bartlomieju bartlomieju merged commit fd023cf into denoland:main Nov 24, 2022
bartlomieju pushed a commit that referenced this pull request Nov 24, 2022
…ses in mind (#16616)

This PR resets the revert commit made by #16610, bringing back #16383
which attempts to fix the issue happening when we use the flash server
with `--watch` option enabled.
Also, some code changes are made to pass the regression test added in
#16610.
@magurotuna magurotuna deleted the segv-flash branch November 25, 2022 02:41
bartlomieju added a commit to bartlomieju/deno that referenced this pull request Nov 26, 2022
bartlomieju added a commit that referenced this pull request Nov 27, 2022
#16839)

…ed promises in mind (#16616)"

This reverts commit fd023cf.

There are reports saying that Vite is often hanging in 1.28.2 and this
is
the only PR that changed something with HTTP server. I think we should
hold off on trying to fix this and instead focus on #16787

CC @magurotuna
cjihrig added a commit to denoland/std that referenced this pull request Nov 28, 2022
…2950)

This reverts commit 5f5413d.

This change was made in response to
denoland/deno#16616. However, that change
was reverted in denoland/deno#16839 due
to it breaking Vite. Let's revert it here as well.
kt3k pushed a commit that referenced this pull request Dec 1, 2022
#16839)

…ed promises in mind (#16616)"

This reverts commit fd023cf.

There are reports saying that Vite is often hanging in 1.28.2 and this
is
the only PR that changed something with HTTP server. I think we should
hold off on trying to fix this and instead focus on #16787

CC @magurotuna
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Uncaught AddrInUse when restarting server in watch mode after saving file

3 participants