AsyncFileCached: switch from a random to an LRU cache eviction policy #1506

nik-io · 2019-04-30T11:36:03Z

Thought it wouldn't hurt switching to an LRU policy for the page cache of the AsyncFileCached actor.

Replaced the vector-based data structure with a boost intrusive doubly linked list that implements the LRU list.
Added an updateHit API to the EvictablePageCache class to update the list on a cache hit.
Operations on the LRU list should have constant complexity (push_back, erase, s_iterator_to, https://www.boost.org/doc/libs/1_67_0/doc/html/boost/intrusive/list.html), so this patch should not introduce additional overhead w.r.t. the previous eviction policy.
Preliminary tests on a skewed read workload show a modest increase in hit rate of 10% or less (measured with hit/miss counters that are not part of this patch), and never worse than the previous eviction policy.

mpilman · 2019-04-30T19:55:28Z

I would strongly suggest that we benchmark this extensively.

In my experience, random caching works often much better than one might expect and LRU adds quite some non-trivial overhead (and CPU is a big problem for us).

At Snowflake we run with very large caches in production. The random eviction policy works great there.

Can you safe-guard this with a knob for now? We have to be able to disable and enable this dynamically until we confirmed that this doesn't introduce any performance regressions for us.

nik-io · 2019-05-01T18:59:34Z

I will safe-guard this with a knob and update the request, no problem.

A couple of notes:

As far as I understand it, if you are using the sqlite based storage engine ("ssd-2") then the sqlite page cache (starting at sqlite3.amalgamation.c:2026) is also used on top of the AsyncFileCache page cache. The sqlite page cache uses a cache eviction scheme that has at least an lru component to it (sqlite3.amalgamation.c:4068).
Re CPU overhead: this patch replaces random accesses to an std::vector (and then pointer de-referencing of vector elements, which are page pointers) with the same number (or less since it's evaluating the oldest pages first, which are most likely not being read at the moment) of random memory accesses through linked list pointers. All the boost intrusive list operations used have constant complexity, so I don't see how this would be more cpu intensive than the existing eviction algorithm. If anything, I expect it to be slightly less cpu intensive, since it's avoiding the (implicit) std::vector resize operations, avoiding calls to randomInt, as well as avoiding the swap and erase from the vector at EvictablePage destructor.
I'd be happy to run a set of benchmarks deemed necessary for acceptance. We are typically running go-ycsb skewed and uniform, as well as some modified native fdb tests with a mixed RW ratio (https://forums.foundationdb.org/t/cluster-tuning-cookbook/520/26). So far, we haven't seen a huge bottom line performance difference with this patch. But looking at the page hit and miss (measured with hit/miss counters that are not part of this patch) the LRU was consistently better than the random replacement policy.

mpilman · 2019-05-01T19:59:00Z

Awesome, thanks a lot for the knob-guard.

I think the sqlite cache is turned off (or it is a very small cache). But I am not sure anymore (would need to hunt this down in the code - @satherton might have a better idea here).

For performance: after looking at the boost stuff I tend to agree that it doesn't seem very likely that this is much more expensive than the current caching strategy. If this patch can improve cache hit rates, it will have a lot of value (for us a higher cache hit rate might save a lot of money as we run on EBS and EBS IOPS are not cheap). @kaomakino has agreed to run some benchmarks on this as soon as he has time.

If you have already done this (but not in this patch): would it be possible to report the cache hit rate? I was planning to do this anyways as we are very interested in this number, but if you have that already it would be great to include this in this patch (or a separate patch). Basically I would like to have something like N2_cache_hits and N2_cache_misses in ProcessMetrics. Calculating this out of the current reported metrics is a bit awkward...

ajbeamon · 2019-05-01T20:21:47Z

something like N2_cache_hits and N2_cache_misses

The format for these detail names has changed in recent times, and should probably end up like CacheHits and CacheMisses.

I'll be curious if some of the existing metrics become redundant in this exercise. I don't remember the exact way they work, but I think we have something roughly like counters for how many reads are done in cache and logically at the filesystem. If it's the case that misses == file reads and hits+misses == cache reads, then the right thing may be rework all of the metrics surrounding file reads to something more sensible. Otherwise, all of these partially overlapping metrics may end up being even more confusing.

That said, my descriptions above are vague recollections, and if it turns out the old metrics provide some useful information that can't be gleaned from the proposed new set, it may make sense to keep all of them.

alexmiller-apple · 2019-05-02T01:15:29Z

We are typically running go-ycsb skewed and uniform

Do be aware that go-ycsb is only one process, so you'll be bottlenecked on the network thread, and that it does one operation per transaction, so you'll actually just be benchmarking GRV performance rather than read performance.

mpilman · 2019-05-02T01:24:20Z

| I'll be curious if some of the existing metrics become redundant in this exercise.

I think there is something, but last time it took me 30 minutes to figure out how to calculate cache miss rate from the ProcessMetrics. This is imho not great 😆 Cache hit rate is imho important enough to be in there. It also makes usage with Wavefront/Grafana (or whatever other tool you are using) much easier.

nik-io · 2019-05-02T07:01:54Z

We are typically running go-ycsb skewed and uniform

Do be aware that go-ycsb is only one process, so you'll be bottlenecked on the network thread, and that it does one operation per transaction, so you'll actually just be benchmarking GRV performance rather than read performance.

Indeed, in our testing we are spawning several go-ycsb (multithreaded) client process instances in parallel because we have found (just as you suggest) that that was the only way to maximize read performance. A better way would be to modify go-ycsb, I guess, to have a separate socket per thread (which I assume can be done with a separate fdbopen per thread).

nik-io · 2019-05-02T07:07:01Z

@satherton: I'll safe guard this with a knob and also add CacheHits and CacheMisses stats (as @mpilman and @ajbeamon suggested), and update the pull request as soon as I have it ready.

- Added a knob to control page cache eviction policy (0=random default, 1=lru) - Added page cache hit, miss, and eviction stats

ajbeamon · 2019-05-02T15:36:58Z

I think there is something, but last time it took me 30 minutes to figure out how to calculate cache miss rate from the ProcessMetrics. This is imho not great 😆 Cache hit rate is imho important enough to be in there. It also makes usage with Wavefront/Grafana (or whatever other tool you are using) much easier.

Agreed, I've wanted more easily interpretable cache statistics too. I just want to avoid having metrics called 'CacheReads' and 'CacheHits', where from the name it may get unclear what the difference is. If CacheReads becomes redundant in this process, I'd be all for getting rid of it.

ajbeamon · 2019-05-02T15:39:20Z

Perhaps an interesting experiment would be to leave them all in and see what the numbers look like. That could provide some empirical evidence whether they are actually just combinations of each other.

nik-io · 2019-05-02T15:43:00Z

@satherton I've updated the pull request:

safe guard the cache eviction change with a knob (defaults to the original random policy)
added page cache statistics (hits, misses, but also evictions). I'm not 100% sure that Int64MetricHandle was the way to go here but that's what I've pushed. If the stats should be implemented in some other way, please let me know and I'll amend.

mpilman · 2019-05-02T17:08:17Z

@Nikolas-Ioannou Thanks - this looks really useful. I hope we can soon take a look at this and run our benchmarks with both caches.

@ajbeamon

That could provide some empirical evidence whether they are actually just combinations of each other.

Does this mean you don't quite know what some of these metrics mean?

I think we should document all metrics (StorageMetrics, ProcessMetrcis, MachineMetrics etc). I will create a ticket for that. This is imho very much necessary...

ajbeamon · 2019-05-02T18:39:40Z

Does this mean you don't quite know what some of these metrics mean?

I think we should document all metrics (StorageMetrics, ProcessMetrcis, MachineMetrics etc). I will create a ticket for that. This is imho very much necessary...

I have a rough idea what they do, but I haven't carefully evaluated exactly how they relate to each other. It could be that there are some minor overlaps or gaps at the edges that I'm not aware of, so a quick test might be able to help calibrate my expectations.

Documenting metrics sounds valuable, these are probably some of the most important trace events to be able to work with.

ben-manes · 2019-05-02T20:50:23Z

If you can capture a workload trace from the cache's perspective (logging of the hash of the key lookups), then I can run a simulation to show the hit rate across various policies (Random, LRU, ARC, LIRS, etc). It will also give a sense of whether the workload is frequency or recency biased, or mixed.

YCSB and Zipf are very good for load testing with a more realistic distribution, so one can obtain a good estimate of scalability. However Zipf is too idealistic for judging a caching policy, other than the basics of if it captures some frequency characteristics. A real trace will have other behaviors to account for, like analytical scans (favors mru) or a job queue (favors lru).

satherton

Looks good but I really would prefer a string policy identifier as it seems likely we will have more options in the future (at which point putting eviction policy behind an interface might be wise too).

satherton · 2019-05-03T00:31:47Z

flow/Knobs.h

 	int64_t SIM_PAGE_CACHE_64K;
 	int64_t BUGGIFY_SIM_PAGE_CACHE_4K;
 	int64_t BUGGIFY_SIM_PAGE_CACHE_64K;
+	int CACHE_EVICTION_POLICY; // 0 RANDOM (default), 1 LRU


We actually do support std::string knobs, and I think cache eviction policy is a good use for one. We could add other policies in the future and strings would be the most user-friendly way to specify them. A member in AsyncFileCached of type CacheEvictionType can then be initialized from the knob's string value.

Sounds good! I will change the knob to be a string, add a member (evictionType) and initialize it based on the knob's string value, and update the patch.

@satherton updated with the knob being a string.

nik-io · 2019-05-03T12:12:27Z

If you can capture a workload trace from the cache's perspective (logging of the hash of the key lookups), then I can run a simulation to show the hit rate across various policies (Random, LRU, ARC, LIRS, etc). It will also give a sense of whether the workload is frequency or recency biased, or mixed....

I'm afraid we don't have a workload trace at the moment, as we are currently in adoption stage and generate workloads with ycsb.

alexmiller-apple · 2019-05-04T00:20:30Z

@fdb-build, ok to test

But the CMake MacOS builder will fail because it's currently just broken.

alexmiller-apple · 2019-05-04T00:22:04Z

@ben-manes, I appreciate the offer to run simulations against traces, and I've started the conversations internally to see from how close to production we would be allowed to produce such traces.

satherton · 2019-05-04T00:26:57Z

fdbrpc/AsyncFileCached.actor.h

+	explicit EvictablePageCache(int pageSize, int64_t maxSize) : pageSize(pageSize), maxPages(maxSize / pageSize) {
+		std::string cep = FLOW_KNOBS->CACHE_EVICTION_POLICY;
+		std::transform(cep.begin(), cep.end(), cep.begin(), ::tolower);
+		if (0 == cep.compare("random"))


An exception should be thrown if the eviction type is not recognized, but also it would be good to error out as soon as possible from fdbserver if the knob is set incorrectly.

To that end, I suggest making a public static helper function to convert a policy string to an enum, throwing an exception on failure (probably a new exception type). That function can then be used here, and we can also call it from fdbserver.cpp using the knob's current value after all of the knob arguments would have been processed and set.

Minor style comment - Personally I think cep == "random" is a bit more natural to read than 0 == cep.compare("random").

@satherton thanks for the input. I have updated the patch with your suggestions.

ben-manes · 2019-05-04T02:02:19Z

@alexmiller-apple It is also fine if you wished to run your traces and I can assist if needed. The trace readers are per-format and produce a stream of 64-bit integer keys, each representing an access. Typically you can capture the key's hash to obfuscate it, using either a text or binary log file. Adding a trace is probably less than a dozen lines of code, so pretty easy if your team decides to keep everything internal.

…cache eviction polity not recognized.

satherton

These changes look good, but there one last thing to do: add a call to evictionPolicyStringToEnum() in fdbserver.actor.cpp to validate the knob's current value after knob arguments have been set.

I believe the first opportunity is here: https://github.com/apple/foundationdb/blob/master/fdbserver/fdbserver.actor.cpp#L1428

nik-io · 2019-05-07T06:36:52Z

These changes look good, but there one last thing to do: add a call to evictionPolicyStringToEnum() in fdbserver.actor.cpp to validate the knob's current value after knob arguments have been set.

I believe the first opportunity is here: https://github.com/apple/foundationdb/blob/master/fdbserver/fdbserver.actor.cpp#L1428

@satherton updated with your requested changes.

AlvinMooreSr · 2019-05-07T17:33:35Z

test this please

nik-io · 2019-05-08T13:54:45Z

test this please

For what it's worth, this is how we tested this using go-ycsb with a page cache size that is ~10% of the dataset:

spawn fdbserver:
./bin/fdbserver --datadir /mnt/fdb-nvme/foundationdb/4500 --listen_address public --logdir /mnt/fdb-nvme/foundationdb --public_address auto:4500 --knob_cache_eviction_policy lru --knob_page_cache_4k $((10*1024*1024)) -L .
load go-ycsb with 1M 100B values
./bin/go-ycsb load foundationdb -P workloads/workloadRO-small -p threadCount=128 -p measurement.interval=1
stop fdbserver. start one with random eviction:
./bin/fdbserver --datadir /mnt/fdb-nvme/foundationdb/4500 --listen_address public --logdir /mnt/fdb-nvme/foundationdb --public_address auto:4500 --knob_cache_eviction_policy random --knob_page_cache_4k $((10*1024*1024)) -L .
run go-ycsb read test with skewed workload
./bin/go-ycsb run foundationdb -P workloads/workloadRO-small -p threadCount=128 -p measurement.interval=1
Repeat 3 and 4 with lru eviction

Stat output from fdbserver log (aggregated using for i in Hits Misses Evictions; do cat trace.127.0.0.1.4500.1557322824.Eu8gSz.1.xml.1Mkeys.100B.1Mreads.10MiBcache.skewed.lru_eviction.read-only.out |grep Cache${i} | tr ' ' '\n' | grep Cache${i} | tr -d '"'| awk -F '=' 'BEGIN{sum=0}{sum+=$2}END{print sum}'; done):
Random
CacheHits=2543175
CacheMisses=1393667
CacheEvictions=1391158

LRU
CacheHits=2681335
CacheMisses=1255413
CacheEvictions=1252908

Attached below is the go-ycsb output for the two runs:
go-ycsb-output.zip

ryanworl · 2019-05-08T15:42:02Z

test this please is a command to the CI system @Nikolas-Ioannou

fdbserver/fdbserver.actor.cpp

AsyncFileCached: switch from a random to an LRU cache eviction policy

2462d29

alexmiller-apple requested a review from satherton May 2, 2019 01:18

alexmiller-apple assigned satherton May 2, 2019

alexmiller-apple mentioned this pull request May 2, 2019

Improve FDB page cache policy #1518

Open

AsyncFileCached: support for lru cache eviction policy

b2280e1

- Added a knob to control page cache eviction policy (0=random default, 1=lru) - Added page cache hit, miss, and eviction stats

satherton reviewed May 3, 2019

View reviewed changes

AsyncFileCached: switch to string for eviction policy knob

fdb3992

satherton reviewed May 4, 2019

View reviewed changes

AsyncFileCached: throw (new) exception, through helper static fn, if …

d617021

…cache eviction polity not recognized.

satherton suggested changes May 6, 2019

View reviewed changes

Validate cache eviction policy value after knob args have been set.

5793b1a

Add page cache hit, miss, and eviction stats to SystemMonitor

c2827f4

satherton reviewed May 8, 2019

View reviewed changes

fdbserver/fdbserver.actor.cpp Outdated Show resolved Hide resolved

Simplified cache eviction policy knob arg check.

067cdf9

satherton approved these changes May 13, 2019

View reviewed changes

satherton merged commit 5a8c974 into apple:master May 17, 2019

nik-io deleted the feature-pagecache-lru branch May 20, 2019 14:51

nik-io restored the feature-pagecache-lru branch September 2, 2020 07:36

AsyncFileCached: switch from a random to an LRU cache eviction policy #1506

AsyncFileCached: switch from a random to an LRU cache eviction policy #1506

Uh oh!

Conversation

nik-io commented Apr 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mpilman commented Apr 30, 2019

Uh oh!

nik-io commented May 1, 2019

Uh oh!

mpilman commented May 1, 2019

Uh oh!

ajbeamon commented May 1, 2019

Uh oh!

alexmiller-apple commented May 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mpilman commented May 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nik-io commented May 2, 2019

Uh oh!

nik-io commented May 2, 2019

Uh oh!

ajbeamon commented May 2, 2019

Uh oh!

ajbeamon commented May 2, 2019

Uh oh!

nik-io commented May 2, 2019

Uh oh!

mpilman commented May 2, 2019

Uh oh!

ajbeamon commented May 2, 2019

Uh oh!

ben-manes commented May 2, 2019

Uh oh!

satherton left a comment

Choose a reason for hiding this comment

Uh oh!

satherton May 3, 2019

Choose a reason for hiding this comment

Uh oh!

nik-io May 3, 2019

Choose a reason for hiding this comment

Uh oh!

nik-io May 3, 2019

Choose a reason for hiding this comment

Uh oh!

nik-io commented May 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexmiller-apple commented May 4, 2019

Uh oh!

alexmiller-apple commented May 4, 2019

Uh oh!

satherton May 4, 2019

Choose a reason for hiding this comment

Uh oh!

nik-io May 6, 2019

Choose a reason for hiding this comment

Uh oh!

ben-manes commented May 4, 2019

Uh oh!

satherton left a comment

Choose a reason for hiding this comment

Uh oh!

nik-io commented May 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlvinMooreSr commented May 7, 2019

Uh oh!

nik-io commented May 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryanworl commented May 8, 2019

Uh oh!

Uh oh!

Reviewers

Assignees

nik-io commented Apr 30, 2019 •

edited

Loading

alexmiller-apple commented May 2, 2019 •

edited

Loading

mpilman commented May 2, 2019 •

edited

Loading

nik-io commented May 3, 2019 •

edited

Loading

nik-io commented May 7, 2019 •

edited

Loading

nik-io commented May 8, 2019 •

edited

Loading