Tokenization meta-issue

As the ongoing changes in tokenization are getting more complicated, I'm writing a meta-issue that maps them down.

# High level goals
- Ensure that `tokenize()` is idempotent (call it twice on the same object, get the same token)
- Ensure that `tokenize()` is deterministic (call it twice on identical objects, or on the same object after a serialization round-trip, and get the same token). This is limited to the same interpreter. **Determinism is not guaranteed across interpreters**.
- Ensure that, when `tokenize()` can't return a deterministic result, there is a system for notifying the dask code (e.g. so that you don't raise after comparing two non-deterministic tokens)
- Robustly detect when https://github.com/dask/dask/issues/9888 happens in order to mitigate its impact

There are a handful of known objects that violate idempotency/determinism:
- `object()` is idempotent, but not deterministic (by choice, as it's normally used as a singleton).
- objects that can't be serialized with cloudpickle are neither idempotent nor deterministic. Expect them to break spectacularly in dask_expr for sure, and probably going forward in many other places too.

Notably, all callables (including lambdas) become deterministic.


# PRs

1. #10876
2. #10898
3. #10904 
4. these two must go in together:
  4a.  #10883
  4b. dask-contrib/dask-expr#822
5. #10896
6. #10884
7. #10913
8. #10909
9. #10919
10. dask/distributed#8498
11. dask/distributed#8499
12. dask/distributed#8185
13. dask/distributed#8512

# Closes
- #10799 
- #6718

# Superseded PRs
- dask-contrib/dask-expr#765
- #10808

# Other actions
:heavy_check_mark:  A/B tests show no impact whatsoever from the additional tokenization labour on the end-to-end workflows in coiled/benchmarks
:heavy_check_mark: A/B tests on dask-expr optimization show 50~150ms slowdown for production-sized TPCH queries, which IMHO is negligible

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Tokenization meta-issue #10905

High level goals

PRs

Closes

Superseded PRs

Other actions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Tokenization meta-issue #10905

Description

High level goals

PRs

Closes

Superseded PRs

Other actions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions