-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
I see a lot of cases of using tokio::spawn without wrapping JoinHandles correctly.
So if we stop execution (e.g tokio::select!) - we can find some tasks that are still alive.
We can even see panic after canceling an execution (at least I did)
Examples:
demux_task: a function and then just await on JoinHandle- parquet spawn_rg_join_and_finalize_task
- parquet spawn_column_parallel_row_group_writer
- orchestration serialize_rb_stream_to_object_store
To Reproduce
Easiest way, is to create a long-running query which probably may even fail, and track the tokio tasks with tokio console after cancelling the execution.
I.e wrap it to select or timeout and check that spawned tasks are still working after cancelling by another branch.
Expected behavior
I believe execution should be cancel-safe, and I think we need to care more about spawned tasks.
We already had such issues before, e.g: #5178
But I think even AbortOnDropMany doesn't solve all edge-cases, for example there are places when we do return Vec<JoinHandle> or just JoinHandle, but actually the execution can be canceled before wrapping into AbortOnDropMany (if there are any async calls in between).
So I'd say #6513 is more relevant now, I guess even single task should be spawned with JoinSet
Additional context
How can we protect of introducing this?
I think we need to disallow tokio::spawn or pay attention to any spawn during review at least