Skip to content

Vine: Improve how errors are propagated with FuturesExecutor#3922

Merged
dthain merged 6 commits intocooperative-computing-lab:masterfrom
gpauloski:issue-3920
Sep 16, 2024
Merged

Vine: Improve how errors are propagated with FuturesExecutor#3922
dthain merged 6 commits intocooperative-computing-lab:masterfrom
gpauloski:issue-3920

Conversation

@gpauloski
Copy link
Copy Markdown
Contributor

Proposed Changes

Improves how task errors and timeouts are propagated in Future, wait(), and as_completed() when using the FuturesExecutor (see #3920). The commits are in chronological order, each addressing a single class/function, if you prefer to review it that way.

A couple notes:

  • In addition to fixing how TimeoutErrors are raised in as_completed, I changed it to return a generator which will reduce latency on the consumer side compared to waiting to construct an iterator from a fully constructed list of futures.
  • FutureFunctionCall.output()/FuturePythonTask.output() were inconsistent in when they raised vs returned task exceptions. I changed those methods to always return task exceptions, and then left the responsibility of raising exceptions to the Future.result().
  • FutureFunctionCall.output() seems to only have access to the string representation of the exception (not the pickled exception like in FuturePythonTask). I construct a FunctionCallNoResult to return containing the string representation of the task's exception. This has the downside of not letting the user catch specific types of task exceptions (e.g., exceptions they might expect to occur), but they will at least see the true exception message and not just an empty FunctionCallNoResult.

Warning: This PR only fixes four of the five items in #3920. I didn't immediately see how to fix Future.done(). It calls Manager.task_state() which then calls cvine.vine_task_state but this function was removed in c1f1cd7. Could Future.done() be implemented by calling Future.result(timeout=0)? It's probably not very efficient to do it that way---maybe there's an alternative to the remove cvine.vine_task_state?

Please feel free to push to my branch if you want to alter anything yourself.

Merge Checklist

The following items must be completed before PRs can be merge.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update Update the manual to reflect user-visible changes.
  • Type Labels Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM Mark your PR as ready to merge.

In Python 3.9 and older, concurrent.futures used its own TimeoutError
class rather than the builtin. TimeoutError is aliased to the builtin in
layer Python versions, so for backwards compatibility we import it in
the test file.

See python/cpython#30197
@dthain dthain requested a review from BarrySlyDelgado August 29, 2024 20:07
@dthain
Copy link
Copy Markdown
Member

dthain commented Aug 29, 2024

@BarrySlyDelgado what do you think?

Copy link
Copy Markdown
Contributor

@BarrySlyDelgado BarrySlyDelgado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me. Thanks @gpauloski!

@gpauloski
Copy link
Copy Markdown
Contributor Author

Thanks! What should be done about the missing cvine.vine_task_state? I'm happy to push a fix if there is one, but maybe it should become a separate issue?

@BarrySlyDelgado
Copy link
Copy Markdown
Contributor

#3929 should be fix future.done() and future.running() checking if a task is cancelled still needs to be fixed. Will create an issue.

@gpauloski
Copy link
Copy Markdown
Contributor Author

Great! FWIW none of my applications use task cancelling so that won't block me (many other executors have limited support for task cancelling).

@BarrySlyDelgado
Copy link
Copy Markdown
Contributor

@dthain This should be ready to be merged. Then we can merge #3929.

@dthain dthain merged commit d1dba75 into cooperative-computing-lab:master Sep 16, 2024
colinthomas-z80 pushed a commit to colinthomas-z80/cctools that referenced this pull request Oct 15, 2024
…rative-computing-lab#3922)

* raise task exceptions in python task futures

* wrap FutureFunctionCall tracebacks in FunctionCallNoResult

* Raise TimeoutError in Future when task result is pending

* yield futures and raise TimeoutError in as_completed

* Improve error handling in wait

* Import TimeoutError for Python <3.10 compatibility

In Python 3.9 and older, concurrent.futures used its own TimeoutError
class rather than the builtin. TimeoutError is aliased to the builtin in
layer Python versions, so for backwards compatibility we import it in
the test file.

See python/cpython#30197
btovar pushed a commit that referenced this pull request Oct 21, 2024
* raise task exceptions in python task futures

* wrap FutureFunctionCall tracebacks in FunctionCallNoResult

* Raise TimeoutError in Future when task result is pending

* yield futures and raise TimeoutError in as_completed

* Improve error handling in wait

* Import TimeoutError for Python <3.10 compatibility

In Python 3.9 and older, concurrent.futures used its own TimeoutError
class rather than the builtin. TimeoutError is aliased to the builtin in
layer Python versions, so for backwards compatibility we import it in
the test file.

See python/cpython#30197
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants