WIP: Store compressed data on the `Worker` #3968

jakirkham · 2020-07-18T23:58:49Z

Another approach to issue ( #3656 ). Found it a bit easier to articulate this as a PR as opposed to a comment. Hope that is ok.

As serialize_bytelist already performs compression, this approach simply leverages that feature and stores the data in a dict in memory. Nests another Buffer in self.data for transitioning in-memory data to in-memory compressed data and then to on disk compressed data (this last step is the same as before).

If we decide we like this approach, something that I could use some advice on would be determining an appropriate transition/exposing that in a sensible way to the user. Though we need not worry about that if another approach may be more appropriate.

cc @prasunanand @TomAugspurger @martindurant @mrocklin @madsbk @quasiben

jakirkham · 2020-07-21T17:39:23Z

Friendly nudge 😉 Would be great to get others take on this 😄

mrocklin · 2020-07-21T18:28:11Z

In principle the approach seems ok to me. I think that as you say there are some active questions to resolve:

How do we decide when to move between different layers?
When is this helpful? When is it harmful?

jakirkham · 2020-07-21T18:53:11Z

Thanks Matt! 😄

How do we decide when to move between different layers?

With this my current thinking is we have another weight and make this configurable (not yet done here). Though this is just my naive thought. Would welcome thoughts from others here 🙂

When is this helpful? When is it harmful?

This is a good question. It would be helpful to identify some workloads where we expect this matters. Happy to think about this some, but would also be interested in knowing if anyone else has workloads they'd like to try here (@fjetter? 😉).

One point worth raising is this is the same compression step we have today. The only difference is it stays around in memory before eventually being moved to disk. So the overall workflow itself hasn't changed. We have merely split one step into two.

Adds a `Buffer` for transitioning in-memory data to in-memory compressed data.

jakirkham · 2020-07-24T23:41:30Z

Have now pushed in some logic to handle configuring when compression occurs. This probably requires some more playing on realistic workloads to determine an appropriate default configuration. Though it is overridable in any event.

jakirkham mentioned this pull request Jul 19, 2020

Autocompression #3702

Open

jakirkham force-pushed the store_compressed_data_worker branch from 73a736d to f3fa583 Compare July 20, 2020 09:38

jakirkham force-pushed the store_compressed_data_worker branch 3 times, most recently from 3184c93 to 047323e Compare July 24, 2020 23:16

Store compressed data on the Worker

121f5e9

Adds a `Buffer` for transitioning in-memory data to in-memory compressed data.

jakirkham force-pushed the store_compressed_data_worker branch from 047323e to 121f5e9 Compare July 24, 2020 23:33

Base automatically changed from master to main March 8, 2021 19:04

crusaderky added the memory label Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Store compressed data on the `Worker` #3968

WIP: Store compressed data on the `Worker` #3968

jakirkham commented Jul 18, 2020

Uh oh!

jakirkham commented Jul 21, 2020

Uh oh!

mrocklin commented Jul 21, 2020

Uh oh!

jakirkham commented Jul 21, 2020

Uh oh!

jakirkham commented Jul 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

WIP: Store compressed data on the Worker #3968

Are you sure you want to change the base?

WIP: Store compressed data on the Worker #3968

Conversation

jakirkham commented Jul 18, 2020

Uh oh!

jakirkham commented Jul 21, 2020

Uh oh!

mrocklin commented Jul 21, 2020

Uh oh!

jakirkham commented Jul 21, 2020

Uh oh!

jakirkham commented Jul 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WIP: Store compressed data on the `Worker` #3968

WIP: Store compressed data on the `Worker` #3968