fix: use `JoinSet` to make spawned tasks cancel-safe #9318

DDtKey · 2024-02-22T22:59:27Z

Which issue does this PR close?

Closes #9317
Closes #6513

Disallows tokio::spawn & spawn_blocking, exceptions only in some tests

Rationale for this change

We need to provide cancel-safe interface, and preferably deny tokio::spawn at all

What changes are included in this PR?

Switch to JoinSet and removing of AbortOnDropSingle and AbortOnDropMany

Are these changes tested?

Not sure if there are any ideas how to test this, but existing tests works.

Are there any user-facing changes?

No

devinjdangelo

This looks good to me, thanks again for cleaning up these instances of tokio::spawn!

My one concern is that usage of JoinSet for single tasks could be confusing for a newcomer to the code without the background/context of this PR and associated issues. It also adds boilerplate handling the possibility of joining multiple tasks, when we know there will only ever be one to join. These readability concerns are a secondary concern to cancellation safety, so imo we can merge this as-is.

We could consider as follow-on work creating some abstractions to improve readability while maintaining cancellation safety. Something like the following skeleton:

/// Light wrapper around a tokio [JoinSet], which only allows 1 task to be spawned,
/// and provides semantics similar to tokio::spawn for managing a single task. This
/// provides cancelation safety for one-off async tasks.
struct OneShotJoinSet<T> {
    inner: JoinSet<T>,
    locked: bool,
}

impl<T: 'static> OneShotJoinSet<T> {
    pub fn new() -> OneShotJoinSet<T> {
        OneShotJoinSet {
            inner: JoinSet::new(),
            locked: false,
        }
    }

    pub fn spawn<F>(&mut self, task: F) -> Result<AbortHandle>
    where
        F: Future<Output = T>,
        F: Send + 'static,
        T: Send,
    {
        if self.locked {
            return internal_err!("OneShotJoinSet only allows spawning one task, but attempted to spawn multiple!");
        }
        self.locked = true;
        Ok(self.inner.spawn(task))
    }

    pub async fn join(&mut self) -> Option<Result<T, JoinError>> {
        self.inner.join_next().await
    }
}

We could also have a similar OrderedJoinSet which wraps a Vec<OneShotJoinSet> for cancellation safety when join order matters.

DDtKey · 2024-02-23T12:47:34Z

Yes, I completely agree and also thought about something like this approach to have a wrapper for these cases. But didn't want to mix this up with the fix.

Just one thought, I think such API is more intuitive:

struct SpawnedTask<T> {
  inner: JoinSet<T>,
}

impl<T: 'static> SpawnedTask<T> {
   // it's constructor, without `self`
   pub fn spawn<F>(task: F) -> Self
    where
        F: Future<Output = T>,
        F: Send + 'static,
        T: Send,
    {
        let mut inner = JoinSet::new();
        inner.spawn(task);
        Self(inner)
    }   
    // and the same for spawn_blocking actually
    
    pub async fn join(mut self) -> Result<T, JoinError> {
        self.inner.join_next().await.expect("instance always have 1 task")
    }
}

There is no way to get runtime exception for attempt to spawn one more task
join with owned self disallows the instance to be called several times (i.e it guarantees to have only 1 task in its lifecycle) => guaranteed on compile time, because SpawnedTask can be crated only with public methods

I'll prepare changes

DDtKey · 2024-02-23T13:42:30Z

Ok, I implemented a wrapper for spawned tasks. Seems reasonable to provide this right away (diff also smaller now)
It simplified the code and now we provides a good interface for newcomers

Generally, there is no need in OrderedSpawnedTasks, this would be just an alias for Vec<SpawnedTask<T>>. I.e:

single task: use SpawnedTask
many unordered: use JoinSet
many ordered: use Vec<SpawnedTask>

See fcf70f1

DDtKey · 2024-02-23T20:28:01Z

cc @alamb @tustvold

devinjdangelo · 2024-02-23T23:45:02Z

The SpawnedTask abstraction looks great! Agreed that your API is more intuitive and Vec<SpawnedTask> is sufficient without an additional wrapper. Thanks again for knocking this out!

tustvold · 2024-02-24T00:00:06Z

I might be missing something, but what is the issue with the AbortOnDrop interfaces? They seem like less boilerplate than the proposed solution in this PR? The SpawnedTask abstraction seems to do the same thing as AbortOnDrop, so I wonder if we can avoid this being a breaking change?

DDtKey · 2024-02-24T00:33:08Z

Subjectively, but I find the new interface less boilerplate - you have one wrapper which spawns and wraps the task instead of dealing with JoinHandles + AbortOnDrop + tokio::spawn

But in any case, this is not the main point here. Safety is more important. AbortOnDrop didn't provide the same guarantees, and easily can be misused.

We even may see JoinHandles were sent through channels and only then wrapped into this interface. But we can cancel the execution even before receiver part is awaited/reached/task received.
Some functions returns JoinHandle - which is kinda confusing, they spawn a task and don't care if it's wrapped safely.
Or just a lot of tasks spawned in a loop with await points in between and only then wrapped into AbortOnDropMany, so you probably never will reach the point they are wrapped.

There is no strict rules how to spawn tasks and how to work with them in the current codebase. And I personally encountered cancellation issues several times with datafusion. A mention in documentation of how to work with this just doesn't scale, we still can see sometimes it happens.

So I just believe we need to have a safe way to work with this and intuitive.

Just use SpawnedTask::spawn instead of tokio::spawn and we won't have at least obvious issues. Also compiler + clippy will prevent such code for us.

And as far as I can see it was raised even before, there is a task for that: #6513

tustvold · 2024-02-24T01:25:41Z

. AbortOnDrop didn't provide the same guarantees, and easily can be misused.

How about adding an AbortOnDrop::spawn method that handles this, and potentially deprecate the methods that take a JoinHandle down the line? This would avoid making a breaking change, and is also IMO a more descriptive name for such a construction?

I dunno, I don't feel strongly, but if we can avoid overloading people with yet more new abstractions for tokio-nonsense, I think that would be better 😄

DDtKey · 2024-02-24T01:39:42Z

deprecate the methods that take a JoinHandle down the line

But a lot of usages were already refactored here #6750 🤔
I thought it's a target to disallow working with JoinHandles directly, it was misused too often.

We have a clippy warning and can specify to use SpawnedTask in case anybody would try to use tokio::spawn (see clippy.toml)
I guess it should improve dev experience

AbortOnDrop::spawn can work only for single task and it's kinda confusing naming to me (I mean "Drop::spawn"). When you need AbortOnDropMany in most of cases - you just need JoinSet (exception is ordering).

but if we can avoid overloading people with yet more new abstractions for tokio-nonsense

We actually reduced them here, it used to be 3 ours + JoinSet + spawn/JoinHandle itself. Now it's 2 ours (stream wrapper and SpawnedTask), plus JoinSet

Every month we have datafusion releases with breaking changes for users of the crate. Is that such important not to change internal structures which affects only devs? It has good outcomes at least for a product

DDtKey · 2024-02-24T02:30:14Z

The naming is definitely negotiable, I'm not making any claims to the truth with the current version.

Subjectively, it looks like this: we spawn a task and then we have SpawnedTask.
Since it's only allowed place to spawn, AbortOnDrop seems redundant to me here (there is no way to have task without this semantic). It's more like a documentation of behavior and reason of why we have such wrapper.

metesynnada

Writing a tokio::spawn like API may increase the task handling robustness. Making the usage guarded by Clippy is quite neat. Overall, LGTM.

crepererum

😍

alamb · 2024-02-27T13:11:55Z

Thank you @DDtKey and everyone who reviewed this PR. I thought it looks very nice and easy to use. Thank you all 🙏

* fix: use `JoinSet` to make spawned tasks cancel-safe * feat: drop `AbortOnDropSingle` and `AbortOnDropMany` * style: doc lint * fix: ordering of the tasks in `RepartitionExec` * fix: replace spawn_blocking with JoinSet * style: disallow spawn methods * fixes: preserve ordering of tasks * style: allow spawning in tests * chore: exclude clippy.toml from rat * chore: typo * feat: introduce `SpawnedTask` * revert outdated comment * switch to SpawnedTask missed outdated part * doc: improve reason for disallowed-method (cherry picked from commit 14264d2)

DDtKey added 3 commits February 22, 2024 23:37

fix: use JoinSet to make spawned tasks cancel-safe

0f0de10

feat: drop AbortOnDropSingle and AbortOnDropMany

fee25aa

style: doc lint

bfaa842

github-actions bot added the core Core DataFusion crate label Feb 22, 2024

DDtKey added 5 commits February 23, 2024 00:28

fix: ordering of the tasks in RepartitionExec

8cf5bb4

fix: replace spawn_blocking with JoinSet

1544305

style: disallow spawn methods

8445082

fixes: preserve ordering of tasks

d9c7cbc

style: allow spawning in tests

55b87b6

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Feb 22, 2024

chore: exclude clippy.toml from rat

769581d

DDtKey marked this pull request as ready for review February 22, 2024 23:56

DDtKey mentioned this pull request Feb 22, 2024

Non cancel-safe tokio spawns #9317

Closed

chore: typo

644d913

devinjdangelo approved these changes Feb 23, 2024

View reviewed changes

feat: introduce SpawnedTask

fcf70f1

DDtKey added 2 commits February 24, 2024 01:45

revert outdated comment

7d0a64f

switch to SpawnedTask missed outdated part

345d7bb

doc: improve reason for disallowed-method

c6f9e2c

metesynnada approved these changes Feb 26, 2024

View reviewed changes

alamb mentioned this pull request Feb 26, 2024

DataFusion weekly project plan (Andrew Lamb) - Feb 26, 2024 #9345

Closed

9 tasks

alamb requested a review from crepererum February 26, 2024 18:33

crepererum approved these changes Feb 27, 2024

View reviewed changes

alamb merged commit 14264d2 into apache:main Feb 27, 2024

alamb mentioned this pull request Feb 27, 2024

feat: replace std Instant with wasm-compatible wrapper #9189

Merged

DDtKey mentioned this pull request Mar 1, 2024

refactor: add join_unwind to SpawnedTask #9422

Merged

fix: use JoinSet to make spawned tasks cancel-safe #9318

fix: use JoinSet to make spawned tasks cancel-safe #9318

Uh oh!

Conversation

DDtKey commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

devinjdangelo left a comment

Choose a reason for hiding this comment

Uh oh!

DDtKey commented Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DDtKey commented Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DDtKey commented Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devinjdangelo commented Feb 23, 2024

Uh oh!

tustvold commented Feb 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DDtKey commented Feb 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tustvold commented Feb 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DDtKey commented Feb 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DDtKey commented Feb 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

metesynnada left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crepererum left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Feb 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix: use `JoinSet` to make spawned tasks cancel-safe #9318

fix: use `JoinSet` to make spawned tasks cancel-safe #9318

DDtKey commented Feb 22, 2024 •

edited

Loading

DDtKey commented Feb 23, 2024 •

edited

Loading

DDtKey commented Feb 23, 2024 •

edited

Loading

DDtKey commented Feb 23, 2024 •

edited

Loading

tustvold commented Feb 24, 2024 •

edited

Loading

DDtKey commented Feb 24, 2024 •

edited

Loading

tustvold commented Feb 24, 2024 •

edited

Loading

DDtKey commented Feb 24, 2024 •

edited

Loading

DDtKey commented Feb 24, 2024 •

edited

Loading

metesynnada left a comment •

edited

Loading