Skip to content

Fix file descriptor leak under high concurrency#144

Merged
kohlschuetter merged 1 commit intokohlschutter:mainfrom
ThePumpingLemma:fix-fd-cleanup
Nov 6, 2023
Merged

Fix file descriptor leak under high concurrency#144
kohlschuetter merged 1 commit intokohlschutter:mainfrom
ThePumpingLemma:fix-fd-cleanup

Conversation

@ThePumpingLemma
Copy link
Copy Markdown
Contributor

With our PoC at
https://github.com/kevink-sq/jetty-concurrency-issue-poc/tree/575469b3453e7b09d9d60f6460cdab8117b8e773, we noticed that there is unbounded growth of file descriptors when running our simple load test:

seq 1 15000 | xargs -P1500 -I{} curl --unix-socket ./junix-ingress.sock http://localhost/process

Leading to timeouts from curl and timeout exceptions in jetty:

java.util.concurrent.TimeoutException: Idle timeout expired: 32422/30000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:170)
        at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:112)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)

This patch fixes the issue by ensuring that keys with the OP_INVALID bit are still tracked in the selected key set and cancelled accordingly

Changes:

  • Add invalid keys to selectedKeysSet in setOpsReady and mark as cancelled, ensuring that the fd is cleaned up timely
  • Switch from cancelNoRemove to cancel now that the registered keys are modified separately in the select implementation
  • Prefer !key.isValid() over key.hasOpInvalid() since the former is a superset of the latter
  • In implCloseSelector, cancel the keys before clearing keysRegistered since keys() is backed by keysRegistered
  • Store cancelled keys in a deque instead of a hash set, which should be a little more efficient
  • Remove dead code

With our PoC at
https://github.com/kevink-sq/jetty-concurrency-issue-poc/tree/575469b3453e7b09d9d60f6460cdab8117b8e773,
we noticed that there is unbounded growth of file descriptors when
running our simple load test:

```bash
seq 1 15000 | xargs -P1500 -I{} curl --unix-socket ./junix-ingress.sock http://localhost/process
```

Leading to timeouts from curl and timeout exceptions in jetty:

```
java.util.concurrent.TimeoutException: Idle timeout expired: 32422/30000 ms
        at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:170)
        at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:112)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
```

This patch fixes the issue by ensuring that keys with the `OP_INVALID`
bit are still tracked in the selected key set and cancelled accordingly

Changes:
* Add invalid keys to `selectedKeysSet` in `setOpsReady` and mark as
  cancelled, ensuring that the fd is cleaned up timely
* Switch from `cancelNoRemove` to `cancel` now that the registered keys
  are modified separately in the select implementation
* Prefer `!key.isValid()` over `key.hasOpInvalid()` since the former is
  a superset of the latter
* In `implCloseSelector`, cancel the keys before clearing
  `keysRegistered` since `keys()` is backed by `keysRegistered`
* Store cancelled keys in a deque instead of a hash set, which should
  be a little more efficient
* Remove dead code
@007kevin
Copy link
Copy Markdown

007kevin commented Oct 28, 2023

without this PR, on the poc you can verify excessive file descriptions are being creating in junix-inrgess.sock during load:

lsof -p $(ps aux | grep "org.example.Main" | awk {'print$2'} | head -n 1) | grep junix-ingress.sock | wc -l 
> 1

kevink-sq added a commit to kevink-sq/jetty-concurrency-issue-poc that referenced this pull request Oct 31, 2023
kevink-sq added a commit to kevink-sq/jetty-concurrency-issue-poc that referenced this pull request Nov 3, 2023
@kohlschuetter kohlschuetter merged commit 6818639 into kohlschutter:main Nov 6, 2023
@ThePumpingLemma ThePumpingLemma deleted the fix-fd-cleanup branch November 6, 2023 14:16
kevink-sq added a commit to kevink-sq/jetty-concurrency-issue-poc that referenced this pull request Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants