Skip to content

Conversation

@mfbalin
Copy link
Collaborator

@mfbalin mfbalin commented Jun 28, 2024

Description

The S3-FIFO cache policy logic is serial for now. The partitioned policy partitions the keys in a deterministic manner to different cache policies we construct, so that each policy can run in parallel. Since performance is important, phmap hash table is used instead of std::unordered_map, which is available as a submodule. I went ahead and updated it to the latest available version to take advantage of potential bug fixes and performance improvements.

40 threads lead to a speedup of around 17x in my preliminary experiments with no change to the cache miss rate.
#7508 aligns the current implementation more with the reference implementation in https://socket.dev/pypi/package/cachemonCache/files/0.0.2/tar-gz/cachemoncache-0.0.2/src/cachemonCache/cache/s3fifo.py

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • I've leverage the tools to beautify the python and c++ code.
  • The PR is complete and small, read the Google eng practice (CL equals to PR) to understand more about small PR. In DGL, we consider PRs with less than 200 lines of core code change are small (example, test and documentation could be exempted).
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
  • Related issue is referred in this PR
  • If the PR is for a new model/paper, I've updated the example index here.

Changes

@mfbalin mfbalin requested a review from frozenbugs June 28, 2024 23:19
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 28, 2024

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 28, 2024

Commit ID: cf8c072

Build ID: 1

Status: ❌ CI test failed in Stage [CPU Build (Win64)].

Report path: link

Full logs path: link

@mfbalin mfbalin requested a review from Rhett-Ying June 29, 2024 00:16
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 29, 2024

Commit ID: 6973add

Build ID: 2

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 29, 2024

Commit ID: 18dff26

Build ID: 3

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 29, 2024

Commit ID: 82550ba

Build ID: 4

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 29, 2024

Commit ID: d0560f2

Build ID: 5

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 30, 2024

Commit ID: e0d9629

Build ID: 6

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mfbalin mfbalin force-pushed the gb_s3_fifo_feature_cache branch from 339583c to c8bac66 Compare July 1, 2024 04:33
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 1, 2024

Commit ID: c14d5878b8ace976f546baaf2f442822981fc5e1

Build ID: 7

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 1, 2024

Commit ID: fe4066d1aba004d4fca28a908fdbbe20432aee2b

Build ID: 8

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 1, 2024

Commit ID: 9d4dbc9

Build ID: 9

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 2, 2024

Commit ID: 6486a040293a89c91a150b31963e12f162bd071c

Build ID: 10

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 2, 2024

Commit ID: fd86134

Build ID: 11

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@mfbalin mfbalin changed the title [GraphBolt] S3-FIFO Feature Cache implementation [GraphBolt] Partitioned S3-FIFO Feature Cache implementation Jul 2, 2024
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 2, 2024

Commit ID: ac66d3a

Build ID: 12

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 3, 2024

Commit ID: 6d0f8fa

Build ID: 26

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mfbalin mfbalin removed the request for review from Rhett-Ying July 3, 2024 18:17
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 3, 2024

Commit ID: 8624ca2

Build ID: 27

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 3, 2024

Commit ID: b702f96

Build ID: 28

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

Copy link
Collaborator

@frozenbugs frozenbugs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good now, after the latest comments addressed should be good to go.

@mfbalin mfbalin requested a review from frozenbugs July 4, 2024 16:33
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 4, 2024

Commit ID: 84fb6994d4244251108f97a195ed6af715ba029e

Build ID: 29

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 4, 2024

Commit ID: c525519

Build ID: 30

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 4, 2024

Commit ID: b266a1b

Build ID: 31

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 4, 2024

Commit ID: 59eba55

Build ID: 32

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mfbalin
Copy link
Collaborator Author

mfbalin commented Jul 4, 2024

There is no way to force from a base class so that all derived classes have the same constructor @frozenbugs. So I don't have a way to force derived classes to have a constructor that take a single capacity parameter.

@mfbalin
Copy link
Collaborator Author

mfbalin commented Jul 4, 2024

With the most recent changes, the PartitionedCachePolicy class is compiled only once even if more cache policies are added in the future, since we are using polymorphism now.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 4, 2024

Commit ID: 4ae58f6

Build ID: 33

Status: ⚪️ CI test cancelled due to overrun.

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jul 4, 2024

Commit ID: a4183d5

Build ID: 34

Status: ✅ CI test succeeded.

Report path: link

Full logs path: link

@mfbalin mfbalin merged commit 25c2953 into dmlc:master Jul 5, 2024
@mfbalin mfbalin deleted the gb_s3_fifo_feature_cache branch July 5, 2024 02:24

template <typename CachePolicy>
PartitionedCachePolicy::PartitionedCachePolicy(
CachePolicy, int64_t capacity, int64_t num_partitions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why force user to pass in an instance of CachePolicy which is not used at all here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to instantiate the template somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants