-
Notifications
You must be signed in to change notification settings - Fork 1.5k
AsyncFileCached: switch from a random to an LRU cache eviction policy #1506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I would strongly suggest that we benchmark this extensively. In my experience, random caching works often much better than one might expect and LRU adds quite some non-trivial overhead (and CPU is a big problem for us). At Snowflake we run with very large caches in production. The random eviction policy works great there. Can you safe-guard this with a knob for now? We have to be able to disable and enable this dynamically until we confirmed that this doesn't introduce any performance regressions for us. |
|
I will safe-guard this with a knob and update the request, no problem. A couple of notes:
|
|
Awesome, thanks a lot for the knob-guard. I think the sqlite cache is turned off (or it is a very small cache). But I am not sure anymore (would need to hunt this down in the code - @satherton might have a better idea here). For performance: after looking at the boost stuff I tend to agree that it doesn't seem very likely that this is much more expensive than the current caching strategy. If this patch can improve cache hit rates, it will have a lot of value (for us a higher cache hit rate might save a lot of money as we run on EBS and EBS IOPS are not cheap). @kaomakino has agreed to run some benchmarks on this as soon as he has time. If you have already done this (but not in this patch): would it be possible to report the cache hit rate? I was planning to do this anyways as we are very interested in this number, but if you have that already it would be great to include this in this patch (or a separate patch). Basically I would like to have something like |
The format for these detail names has changed in recent times, and should probably end up like I'll be curious if some of the existing metrics become redundant in this exercise. I don't remember the exact way they work, but I think we have something roughly like counters for how many reads are done in cache and logically at the filesystem. If it's the case that misses == file reads and hits+misses == cache reads, then the right thing may be rework all of the metrics surrounding file reads to something more sensible. Otherwise, all of these partially overlapping metrics may end up being even more confusing. That said, my descriptions above are vague recollections, and if it turns out the old metrics provide some useful information that can't be gleaned from the proposed new set, it may make sense to keep all of them. |
Do be aware that go-ycsb is only one process, so you'll be bottlenecked on the network thread, and that it does one operation per transaction, so you'll actually just be benchmarking GRV performance rather than read performance. |
|
| I'll be curious if some of the existing metrics become redundant in this exercise. I think there is something, but last time it took me 30 minutes to figure out how to calculate cache miss rate from the |
Indeed, in our testing we are spawning several go-ycsb (multithreaded) client process instances in parallel because we have found (just as you suggest) that that was the only way to maximize read performance. A better way would be to modify go-ycsb, I guess, to have a separate socket per thread (which I assume can be done with a separate fdbopen per thread). |
|
@satherton: I'll safe guard this with a knob and also add |
- Added a knob to control page cache eviction policy (0=random default, 1=lru) - Added page cache hit, miss, and eviction stats
Agreed, I've wanted more easily interpretable cache statistics too. I just want to avoid having metrics called 'CacheReads' and 'CacheHits', where from the name it may get unclear what the difference is. If CacheReads becomes redundant in this process, I'd be all for getting rid of it. |
|
Perhaps an interesting experiment would be to leave them all in and see what the numbers look like. That could provide some empirical evidence whether they are actually just combinations of each other. |
|
@satherton I've updated the pull request:
|
|
@Nikolas-Ioannou Thanks - this looks really useful. I hope we can soon take a look at this and run our benchmarks with both caches.
Does this mean you don't quite know what some of these metrics mean? I think we should document all metrics (StorageMetrics, ProcessMetrcis, MachineMetrics etc). I will create a ticket for that. This is imho very much necessary... |
I have a rough idea what they do, but I haven't carefully evaluated exactly how they relate to each other. It could be that there are some minor overlaps or gaps at the edges that I'm not aware of, so a quick test might be able to help calibrate my expectations. Documenting metrics sounds valuable, these are probably some of the most important trace events to be able to work with. |
|
If you can capture a workload trace from the cache's perspective (logging of the hash of the key lookups), then I can run a simulation to show the hit rate across various policies (Random, LRU, ARC, LIRS, etc). It will also give a sense of whether the workload is frequency or recency biased, or mixed. YCSB and Zipf are very good for load testing with a more realistic distribution, so one can obtain a good estimate of scalability. However Zipf is too idealistic for judging a caching policy, other than the basics of if it captures some frequency characteristics. A real trace will have other behaviors to account for, like analytical scans (favors mru) or a job queue (favors lru). |
satherton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I really would prefer a string policy identifier as it seems likely we will have more options in the future (at which point putting eviction policy behind an interface might be wise too).
flow/Knobs.h
Outdated
| int64_t SIM_PAGE_CACHE_64K; | ||
| int64_t BUGGIFY_SIM_PAGE_CACHE_4K; | ||
| int64_t BUGGIFY_SIM_PAGE_CACHE_64K; | ||
| int CACHE_EVICTION_POLICY; // 0 RANDOM (default), 1 LRU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually do support std::string knobs, and I think cache eviction policy is a good use for one. We could add other policies in the future and strings would be the most user-friendly way to specify them. A member in AsyncFileCached of type CacheEvictionType can then be initialized from the knob's string value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I will change the knob to be a string, add a member (evictionType) and initialize it based on the knob's string value, and update the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satherton updated with the knob being a string.
I'm afraid we don't have a workload trace at the moment, as we are currently in adoption stage and generate workloads with ycsb. |
|
@fdb-build, ok to test But the CMake MacOS builder will fail because it's currently just broken. |
|
@ben-manes, I appreciate the offer to run simulations against traces, and I've started the conversations internally to see from how close to production we would be allowed to produce such traces. |
fdbrpc/AsyncFileCached.actor.h
Outdated
| explicit EvictablePageCache(int pageSize, int64_t maxSize) : pageSize(pageSize), maxPages(maxSize / pageSize) { | ||
| std::string cep = FLOW_KNOBS->CACHE_EVICTION_POLICY; | ||
| std::transform(cep.begin(), cep.end(), cep.begin(), ::tolower); | ||
| if (0 == cep.compare("random")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An exception should be thrown if the eviction type is not recognized, but also it would be good to error out as soon as possible from fdbserver if the knob is set incorrectly.
To that end, I suggest making a public static helper function to convert a policy string to an enum, throwing an exception on failure (probably a new exception type). That function can then be used here, and we can also call it from fdbserver.cpp using the knob's current value after all of the knob arguments would have been processed and set.
Minor style comment - Personally I think cep == "random" is a bit more natural to read than 0 == cep.compare("random").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satherton thanks for the input. I have updated the patch with your suggestions.
|
@alexmiller-apple It is also fine if you wished to run your traces and I can assist if needed. The trace readers are per-format and produce a stream of 64-bit integer keys, each representing an access. Typically you can capture the key's hash to obfuscate it, using either a text or binary log file. Adding a trace is probably less than a dozen lines of code, so pretty easy if your team decides to keep everything internal. |
…cache eviction polity not recognized.
satherton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good, but there one last thing to do: add a call to evictionPolicyStringToEnum() in fdbserver.actor.cpp to validate the knob's current value after knob arguments have been set.
I believe the first opportunity is here: https://github.com/apple/foundationdb/blob/master/fdbserver/fdbserver.actor.cpp#L1428
@satherton updated with your requested changes. |
|
test this please |
For what it's worth, this is how we tested this using go-ycsb with a page cache size that is ~10% of the dataset:
Stat output from fdbserver log (aggregated using LRU Attached below is the go-ycsb output for the two runs: |
|
|
Thought it wouldn't hurt switching to an LRU policy for the page cache of the AsyncFileCached actor.