-
-
Notifications
You must be signed in to change notification settings - Fork 357
Description
While debugging memory leak in Prisma, we stumbled upon couple of issues with ThreadsafeFunctions, which, I beleive, boil down to a not quite correct underlying C APIs usages.
unref leaks memory
napi-rs implements its own reference counter on top of napi_ref_threadsafe_function and napi_unref_threadsafe_function and skips napi_release_threadsafe_function call if it is 0. This was done to fix #518. However, there are couple of issues with that fix:
- despite what their name might suggest, original C APIs are not reference counted. Docs say as much:
Similar to uv_ref it is also idempotent.
- they don't really have anything to do with resource management, they just allow event loop to run not waiting for the function cleanup.
Neither does napi_unref_threadsafe_function mark the thread-safe functions as able to be destroyed nor does napi_ref_threadsafe_function prevent it from being destroyed.
So, skipping release call if unref is called causes the function and all its context to leak. You can see it in my reproduction repo if you run npm run build && npm run leaking: memory usage will steadily climb until the process use up all available heap space and crash.
Neon solved the issue, similar to #518 in another way. They use threadsafe_function finalization callback to mark the function as finalized and skip release call in drop in that case. finalize callback is called either when the last thread releases the function, abort is called or environment is destroyed. Solving the issue this way allows to use unref without leaking any memory, but if we stop there, we'll have another problem.
using threadsafe function after finalization callback have been called is not safe
This have not been a problem before, since function never was finalized until VM is destroyed. Now, however, it is possible that call will happen while vm is being destroyed, which is a problem, since the function is not guaranteed to be allocated anymore. Same is true in situation where napi_call_threadsafe_function returns napi_closing. In order to avoid segfault in this case, we have to make sure that no napi_*_threadsafe_function APIs are called after any of those two events happened.
My initial plan was to integrate fixes for those issues into ThreadsafeFunction wrapper and send a PR, however, I was not able to do it in a way that preserves backward compatibility. So, instead I've implement my own wrapper around unsafe APIs, which does not have those issues. You can see it in the same reproduction repo if you run npm run build && npm run fixed - heap usage stays below 15 MB and script completes without errors.
Ideally, we would like those changes to be integrated into napi-rs so we don't have to maintain our own wrappers. Maybe, you'll figure out how to do it in a backward compatible way. Or maybe it will make sense to deprecate old one and build a new one either based on our implementation, or on Neon one.