Plug file descriptor leaks#643
Conversation
|
Thanks for this PR/bug fix. I'll take a look tomorrow and give more feedback, in the mean time I've added a few reviewers. |
1 similar comment
|
Thanks for this PR/bug fix. I'll take a look tomorrow and give more feedback, in the mean time I've added a few reviewers. |
mathisrichter
left a comment
There was a problem hiding this comment.
Thanks so much for taking this on, @ssgier! This has been a problem from the start. If this indeed solves the problem, it would be an amazing contribution to Lava that makes a lot of lives easier!
I made a few renaming and style suggestions, nothing major.
Do you see a way of writing a unit test for this new functionality? One that ideally fails with the old implementation and passes with the new?
|
Thanks for reviewing @mathisrichter! I will implement your suggestions during the coming days and will also think of a way to write some specific tests. Some follow-up informationBelow is a screenshot of my terminal emulator, showing proof of concept:
This means that with low ulimit, only the fix branch avoids the files error, but with high ulimit, both avoid it. This result is deterministic. It can be run any number of times with the same result. I am running:
However: I tested the fix today on my MacBook (macOS Ventura 13.2) and there it does not work. After going back to the Linux machine and taking a deeper look, I noticed that if I call the tests in a specific way (by repeating the same test a large number of times), some descriptors still leak, although much more slowly, eventually leading to the same error. Conclusion: This change fixes some leaks, but not all of them yet. At least one remains. I will try to track them all down and come back with an update soon. |
|
@ssgier Thanks for that deeper analysis! I brought up your PR internally and learned that there may be tests for this error in some branch. I asked @joyeshmishra to add them as a comment to this thread. |
|
Update: Found some more leaks and committed a draft of a fix here. New behavior: With these fixes applied, all file descriptor leaks appear to be resolved on both Linux and macOs and the whole test suite runs through with low ulimit. Next steps: I will implement @mathisrichter's suggestions and also write an explicit test. It turns out that there is a simple way of doing this: Details
I tried to keep the fixes as non-intrusive as possible. One shortcoming is that some of them rely on objects being reclaimed, but it works in practice and from what I can see, it does not break existing functionality. |
joyeshmishra
left a comment
There was a problem hiding this comment.
Thanks for this contribution. Looks great. Expecting you have verified on all three platforms before merge. Small comment on code otherwise looks great,
|
@joyeshmishra thanks for reviewing! I did verification on Linux and macOS but not on Windows. Is resource leakage also a problem on Windows? It looked like a problem specific to Unix-like systems. |
|
As discussed separately with @mathisrichter, removed the now obsolete hint from README and tutorial. The tag referred in the README would have to be updated in the next release. |
harryliu-intel
left a comment
There was a problem hiding this comment.
Looks good to me, thanks.
* Plug file descriptor leaks * Plug more leaks * Improve code style * Check higher wait time on CI run * Better fix for semaphore logic * Add unit test for file descriptor leakage * Let threads terminate asynchronously * Disable leakage test on Windows (not applicable) * Add more type hints * Improve naming * Remove hint about too many open files error

Issue Number: #642
Objective of pull request: fix file descriptor leaks.
Pull request checklist
Your PR fulfills the following requirements:
flakeheaven lint src/lava tests/) and (bandit -r src/lava/.) pass locallypytest) passes locallyPull request type
Please check your PR type:
What is the current behavior?
What is the new behavior?
Does this introduce a breaking change?
Supplemental information
Key points all involve addressing sub-optimal usage of the Python multiprocessing library:
Note: the fix works for Linux. For other OSes it would have to be checked as well.