IncrementalIDB - MegaChunking#874
Merged
techfort merged 11 commits intotechfort:masterfrom Jan 22, 2021
Merged
Conversation
Contributor
Author
|
@techfort We've been running this internally for a while, found no issues so far |
Owner
|
@radex this looks fantastic, i'm merging and sometime today i'll get round to doing a new release. I should really automate this release crap |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(WIP, please don't review the code yet)
Yes, it's me again, with yet another pull request full of strange, complicated code — and another promise that it's worth it for performance 🙃
I think I'm approaching the limits of what IndexedDB can do performance-wise, but it's important for my use case to squeeze all that's possible out of it ;)
TL;DR: It loads the database 22% faster ;)
I made a picture to explain the problem that this PR is trying to solve:
IndexedDB is implemented (in all browsers as far as I can tell, but certainly in Chrome and Safari) with a multi-process architecture, and the cross-process communication is not very efficient. This can be seen above - waiting for IDB to fetch data from disk takes relatively little time, and most of the time is spent waiting for the XPC dance to complete transferring data -- and clearly, it's not very well tuned, as the CPU usage in the browser process is very low.
So the goal is to:
This is what I achieved:
This achieves 22% improvement on my benchmark, and likely more free performance for apps that didn't opt to manually tune IncrementalIDB by supplying
serializeChunk/deserializeChunk.Instead of calling
IDBObjectStore.getAll(), I'm fetching multiplemegachunks(chunks of chunks 🙃) - currently 20 requests using adjacent IDBKeyRanges. AFAICT, the IDB process in both Safari and Chrome does the first phase (actual disk/db work) sequentially, so there's no win here, but the XPC is more efficient for some reason. I guess since the IDB process sends more messages to browser process, there are fewer gaps in processing them on browser side, so CPU utilization stays higher.In a further improvement (I call this
megachunk interleaving), I only request first half of the megachunks initially, and then in inonsuccessof each one I request the(i+n/2)th chunk. This reduces the initial wait for IDB to almost nothing, and improves concurrency, as the IDB process is kept busy while JS is processing the first half of its work. (I also moved most of the chunk processing - JSON.parse and optional deserializeChunk from the end of the process much earlier - to each megachunk's onSuccess, so that main and IDB processes can be kept busy at the same time… I think this should also improve GC pressure a little bit, but I haven't yet figured out a good technique for measuring that, since it's very noisy)I'm almost out of ideas for further improvements for now, and the law of diminishing returns is catching up to me, so it'll probably the last PR in the series for a while...
PS. In case you were wondering about using IDBCursor to maximize concurrency opportunity — I tried that multiple times, and it doesn't work. I tried interleaving multiple IDBCursors, and I got to nearly the same performance as interleaved megachunking, but still slower. There are just too many useless pauses on main thread...