Create 96-byte fast allocator for storage queue PTree nodes #1336

tclinken · 2019-03-21T00:14:26Z

Since storage queue nodes account for a large portion of memory usage,
we can save space by only allocating 96 bytes instead of 128 bytes for
each node.

ajbeamon

There are some spots (3?) where we log metrics about each fast allocator. We should add this to those as well.

ajbeamon

I took a peek at some running clusters to see how much memory this might account for. It looks like we top out at about 1GB memory allocated in 128 byte pools, which would mean that we could save up to 250MB per process in these clusters if all of that memory was being used for this purpose. It looks like our average use in this range is a good bit less (the cluster with the highest average came in under 600MB).

This will come at the cost of having potentially unused memory resident in an additional pool, which based on a look at similarly sized pools in these clusters could be up to a few hundred MBs, although on average it's also a good bit less.

Have you run any tests that measure the difference in memory used with and without this pool over some extended period? What did it look like?

ajbeamon · 2019-03-21T16:30:45Z

fdbclient/VersionedMap.h

 	}

-	static const int overheadPerItem = 128*4;
+	static const int overheadPerItem = 96*4;


There's not really any documentation for how this overhead is getting calculated, but I assume you have some idea since you're changing it. If so, would you mind adding a comment to explain what this is based on? As an aside, if it's based on the fast allocated size of PTree objects, it seems like that wouldn't be fixed given that some of the types involved are templated. It looks like we only use this variable in one place where the types would be fixed, so maybe that's how we'd get away with it?

etschannen · 2019-03-21T18:37:19Z

You are missing calls to TRACEALLOCATOR and DETAILALLOCATORMEMUSAGE

tclinken · 2019-03-21T21:06:57Z

@ajbeamon: We don't have performance test results yet, which is why I left the review in draft mode. @kaomakino is planning to run some tests on this branch to see if this is a worthwhile optimization.

Since storage queue nodes account for a large portion of memory usage, we can save space by only allocating 96 bytes instead of 128 bytes for each node.

tclinken · 2019-05-15T19:30:45Z

This change has now been benchmarked with an insert-only workload and saw an increase in throughput, due to a smaller storage queue and thus less aggressive ratekeeper.

ajbeamon · 2019-05-15T20:29:43Z

If the benefit of this is just that the storage queue is smaller, does the performance increase imply that the bottleneck in this case was that the input rate in 5 seconds is large enough to fill the queue? Or I guess it could potentially be longer, if you have a larger durability lag.

If that is the bottleneck, are you seeing this as a sustained improvement, or only during short bursts? I think the expectation is that the disk would be limiting rather than the size of the queue, but if you are running into queue size issues an increase in the queue size may also be warranted.

I don't intend any of this to mean that we shouldn't make this change, by the way, I'm just interested in what we might expect the effects to be in various scenarios.

tclinken · 2019-05-16T00:03:02Z

Mainly the benefit is just that the storage queue is smaller, so bursts of writes will not result in throttling as quickly. In the long run, with a steady workload, writing to disk is the bottleneck, so we're not seeing throughput increase in this case.

ajbeamon reviewed Mar 21, 2019

View reviewed changes

hgray1 assigned tclinken Mar 25, 2019

tclinken added 4 commits March 25, 2019 13:44

Added 96-byte FastAllocator

007abbc

Since storage queue nodes account for a large portion of memory usage, we can save space by only allocating 96 bytes instead of 128 bytes for each node.

Removed Arena96

4a34672

Added calls to TRACEALLOCATOR(96) and DETAILALLOCATORMEMUSAGE(96)

dbcf1d7

Generalized VersionedMap<K, T>::overheadPerItem calculation

4293a3e

tclinken force-pushed the fast-allocate-ptree-nodes branch from 759ad5a to 4293a3e Compare March 25, 2019 20:46

tclinken marked this pull request as ready for review April 10, 2019 20:27

alexmiller-apple assigned ajbeamon and unassigned tclinken May 17, 2019

ajbeamon approved these changes May 17, 2019

View reviewed changes

ajbeamon merged commit a8b9d8e into apple:master May 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create 96-byte fast allocator for storage queue PTree nodes #1336

Create 96-byte fast allocator for storage queue PTree nodes #1336

Uh oh!

tclinken commented Mar 21, 2019

Uh oh!

ajbeamon left a comment

Uh oh!

ajbeamon left a comment

Uh oh!

ajbeamon Mar 21, 2019

Uh oh!

etschannen commented Mar 21, 2019

Uh oh!

tclinken commented Mar 21, 2019

Uh oh!

tclinken commented May 15, 2019 •

edited

Loading

Uh oh!

ajbeamon commented May 15, 2019

Uh oh!

tclinken commented May 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Create 96-byte fast allocator for storage queue PTree nodes #1336

Create 96-byte fast allocator for storage queue PTree nodes #1336

Uh oh!

Conversation

tclinken commented Mar 21, 2019

Uh oh!

ajbeamon left a comment

Choose a reason for hiding this comment

Uh oh!

ajbeamon left a comment

Choose a reason for hiding this comment

Uh oh!

ajbeamon Mar 21, 2019

Choose a reason for hiding this comment

Uh oh!

etschannen commented Mar 21, 2019

Uh oh!

tclinken commented Mar 21, 2019

Uh oh!

tclinken commented May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajbeamon commented May 15, 2019

Uh oh!

tclinken commented May 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tclinken commented May 15, 2019 •

edited

Loading