rustdoc-search: shard the search result descriptions #122614

notriddle · 2024-03-17T03:01:39Z

Preview

This makes no visual changes to rustdoc search. It's a pure perf improvement.

old

Preview: http://notriddle.com/rustdoc-html-demo-10/doc/std/index.html?search=vec

WebPageTest Comparison with before branch on a sort of worst case (searching vec, winds up downloading most of the shards anyway): https://www.webpagetest.org/video/compare.php?tests=240317_AiDc61_2EM,240317_AiDcM0_2EN

Waterfall diagram:

Preview: http://notriddle.com/rustdoc-html-demo-10/doc2/std/index.html?search=vec

WebPageTest Comparison with before branch on a sort of worst case (searching vec, winds up downloading most of the shards anyway): https://www.webpagetest.org/video/compare.php?tests=240322_BiDcCH_13R,240322_AiDcJY_104

Description

r? @GuillaumeGomez

The descriptions are, on almost all crates¹, the majority of the size of the search index, even though they aren't really used for searching. This makes it relatively easy to separate them into their own files.

Additionally, this PR pulls out information about whether there's a description into a bitmap. This allows us to sort, truncate, then download.

This PR also bumps us to ES8. Out of the browsers we support, all of them support async functions according to caniuse.

https://caniuse.com/async-functions

Detailed sizes

$ cat test.sh 
set -ex
cp ../search-index*.js search-index.js
awk 'FNR==NR {a++;next} FNR<a-3' search-index.js{,} | awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}' | sed -E "s:\\\\':':g" > search-index.json
jq -c '.t' search-index.json > t.json
jq -c '.n' search-index.json > n.json
jq -c '.q' search-index.json > q.json
jq -c '.D' search-index.json > D.json
jq -c '.e' search-index.json > e.json
jq -c '.i' search-index.json > i.json
jq -c '.f' search-index.json > f.json
jq -c '.c' search-index.json > c.json
jq -c '.p' search-index.json > p.json
jq -c '.a' search-index.json > a.json
du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json
$ bash test.sh 
+ cp ../search-index1.78.0.js search-index.js
+ awk 'FNR==NR {a++;next} FNR<a-3' search-index.js search-index.js
+ awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}'
+ sed -E 's:\\'\'':'\'':g'
+ jq -c .t search-index.json
+ jq -c .n search-index.json
+ jq -c .q search-index.json
+ jq -c .D search-index.json
+ jq -c .e search-index.json
+ jq -c .i search-index.json
+ jq -c .f search-index.json
+ jq -c .c search-index.json
+ jq -c .p search-index.json
+ jq -c .a search-index.json
+ du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json
64K     t.json
800K    n.json
8.0K    q.json
4.0K    D.json
16K     e.json
192K    i.json
544K    f.json
4.0K    c.json
36K     p.json
20K     a.json

These are, roughly, the size of each section in the standard library (this tool actually excludes libtest, for parsing-json-with-awk reasons, but libtest is tiny so it's probably not important).

t = item type, like "struct", "free fn", or "type alias". Since one byte is used for every item, this implies that there are approximately 64 thousand items in the standard library.

n = name, and that's now the largest section of the search index with the descriptions removed from it

q = parent module path, stored parallel to the items within

D = the size of each description shard, stored as vlq hex numbers

e = empty description bit flags, stored as a roaring bitmap

i = parent type index as a link into p, stored as decimal json numbers; used only for associated types; might want to switch to vlq hex, since that's shorter, but that would be a separate pr

f = function signature, stored as lists of lists that index into p

c = deprecation flag, stored as a roaring bitmap

p = parent type, stored separately and linked into from i and f

a = alias, as [[key, value]] pairs

Search performance

http://notriddle.com/rustdoc-html-demo-11/perf-shard/index.html

For example, in stm32f4:

before after

Testing T -> U ... in_args = 0, returned = 0, others = 200
wall time = 617

Testing T, U ... in_args = 0, returned = 0, others = 200
wall time = 198

Testing T -> T ... in_args = 0, returned = 0, others = 200
wall time = 282

Testing crc32 ... in_args = 0, returned = 0, others = 0
wall time = 426

Testing spi::pac ... in_args = 0, returned = 0, others = 0
wall time = 673

Testing T -> U ... in_args = 0, returned = 0, others = 200
wall time = 716

Testing T, U ... in_args = 0, returned = 0, others = 200
wall time = 207

Testing T -> T ... in_args = 0, returned = 0, others = 200
wall time = 289

Testing crc32 ... in_args = 0, returned = 0, others = 0
wall time = 418

Testing spi::pac ... in_args = 0, returned = 0, others = 0
wall time = 687

user: 005.345 s
sys:  002.955 s
wall: 006.899 s
child_RSS_high:     583664 KiB
group_mem_high:     557876 KiB

user: 004.652 s
sys:  000.565 s
wall: 003.865 s
child_RSS_high:     538696 KiB
group_mem_high:     511724 KiB

This perf tester is janky and unscientific enough that the apparent differences might just be noise. If it's not an order of magnitude, it's probably not real.

Future possibilities

Currently, results are not shown until the descriptions are downloaded. Theoretically, the description-less results could be shown. But actually doing that, and making sure it works properly, would require extra work (we have to be careful to avoid layout jumps).
More than just descriptions can be sharded this way. But we have to be careful to make sure the size wins are worth the round trips. Ideally, data that’s needed only for display should be sharded while data needed for search isn’t.
Full text search also needs this kind of infrastructure. A good implementation might store a compressed bloom filter in the search index, then download the full keyword in shards. But, we have to be careful not just of the amount readers have to download, but also of the amount that publishers have to store.

https://microsoft.github.io/windows-docs-rs/, a crate with
44MiB of pure names and no descriptions for them, is an outlier
and should not be counted. But this PR should improve it, by replacing a long line of empty strings with a compressed bitmap with a single Run section. Just not very much. ↩

The descriptions are, on almost all crates[^1], the majority of the size of the search index, even though they aren't really used for searching. This makes it relatively easy to separate them into their own files. This commit also bumps us to ES8. Out of the browsers we support, all of them support async functions according to caniuse. https://caniuse.com/async-functions [^1]: <https://microsoft.github.io/windows-docs-rs/>, a crate with 44MiB of pure names and no descriptions for them, is an outlier and should not be counted.

src/librustdoc/html/static/js/search.js

GuillaumeGomez · 2024-03-17T10:13:03Z

Do you have the time difference between storing all docs in one file versus max 1MB per shard?

notriddle · 2024-03-17T16:47:55Z

Do you have the time difference between storing all docs in one file versus max 1MB per shard?

It depends on what you search for, unfortunately. Using a simulated 2G connection:
searching hashmap on nightly goes from 9s (on nightly) to 8s (on this branch).
but searching vec on nightly goes from 9s (on nightly, because nothing changed) to 10s (on this branch).

It also depends on the sort of data you're searching. The standard library is bigger than most crates, but it's not truly huge. I picked 1MiB after toying around in the Firefox Dev Tools simulated network speeds and finding things got slower with smaller shards.

src/librustdoc/html/static/js/search.js

src/tools/rustdoc-js/tester.js

GuillaumeGomez · 2024-03-19T13:20:32Z

I think it's a very good improvement, thanks for that! Apart from my comments, what else remains to be done here?

notriddle · 2024-03-19T15:17:55Z

The biggest problem right now is that it searches, then immediately downloads the descriptions. Then it sorts, deduplicates, and truncates.

I'd like to do it the other way around. First search, then sort, deduplicate, and truncate, then finally download descriptions. This way, I don't download descriptions for items that aren't shown and can display the results while their descriptions are still downloading.

To make that work, I need to store a flag for each item, in the search index, saying if it has a description or not. I've been reading up on compressed bitmaps to serve this role.

GuillaumeGomez · 2024-03-19T15:21:50Z

Wouldn't it be easier to have an optional index to the position of the item's documentation? Then based on that index, you know which file to download. (Or maybe it's exactly what you said and I completely misunderstood you.)

This adds a bit more data than "pure sharding" by including information about which items have no description at all. This way, it can sort the results, then truncate, then finally download the description. With the "e" bitmap: 2380KiB Without the "e" bitmap: 2364KiB

rustbot · 2024-03-22T02:37:35Z

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez, @jsha

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

notriddle · 2024-03-22T03:03:59Z

I’ve got it working so it can download the descriptions after sorting.

This makes it easy to have higher parallelism and cut the shard size down. It downloads less data now, even when searching for “vec.”

src/librustdoc/html/render/search_index.rs

GuillaumeGomez · 2024-03-22T20:06:57Z

src/librustdoc/html/render/search_index.rs

+///
+/// The `index` is a JSON-encoded list of names and other information.
+///
+/// The desc has newlined descriptions, split up by size into 128KiB shards.


Would be nice if you added some explanations why 128KiB was picked and not another.

There is no single, optimal size for these shards, because it depends on
configuration values that we can't predict or control, such as the version
of HTTP used (HTTP/1.1 would work better with larger files, while HTTP/2
and 3 are more agnostic), transport compression (gzip, zstd, etc), whether
the search query is going to produce a large number of results or a small
number, the bandwidth delay product of the network...

Gzipping some standard library descriptions to guess what transport
compression will do, the compressed file sizes can be as small as 4.9KiB
or as large as 18KiB (ignoring the final 1.9KiB shard of leftovers).
A "reasonable" range for files is for them to be bigger than 1KiB,
since that's about the amount of data that can be transferred in a
single TCP packet, and 64KiB, the maximum amount of data that
TCP can transfer in a single round trip without extensions.

src/librustdoc/html/render/search_index.rs

src/librustdoc/html/static/js/main.js

src/librustdoc/html/static/js/search.js

notriddle · 2024-03-23T00:07:08Z

I've done most of what you asked for. The only ones I couldn't easily address are:

#122614 (comment) - I don't understand the question. Is more documentation needed on that struct? I used it to avoid returning (String, String), which would be really hard to swallow.

#122614 (comment) - I added a comment, and spelled out directly, how that promise works.

#122614 (comment) - I added a comment, but, really, the constant was mostly obtained by testing with WPT and trying to not make things slower (that earlier comment about how searching for vec takes nine seconds? now it takes 8; it really does manage to avoid making things worse).

notriddle · 2024-03-25T20:30:25Z

Search performance metrics: http://notriddle.com/rustdoc-html-demo-11/perf-shard/index.html

It's close enough that any apparent performance difference might just be noise (and is certainly worth it if we can reduce the amount of data you have to download).

src/librustdoc/html/render/search_index/encode.rs

GuillaumeGomez

Looks good to me. r=me with or without the suggestions I made.

Co-authored-by: Guillaume Gomez <[email protected]>

notriddle · 2024-04-02T16:12:35Z

@bors r=GuillaumeGomez

bors · 2024-04-02T16:12:37Z

📌 Commit a272007 has been approved by GuillaumeGomez

It is now in the queue for this repository.

…llaumeGomez Rollup of 4 pull requests Successful merges: - rust-lang#122614 (rustdoc-search: shard the search result descriptions) - rust-lang#123338 (Update to new browser-ui-test version) - rust-lang#123366 (Minor by_move_body impl cleanups) - rust-lang#123371 (Remove dangling `.mir.stderr` and `.thir.stderr` test files) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of rust-lang#122614 - notriddle:notriddle/search-desc, r=GuillaumeGomez rustdoc-search: shard the search result descriptions ## Preview This makes no visual changes to rustdoc search. It's a pure perf improvement. <details><summary>old</summary> Preview: <http://notriddle.com/rustdoc-html-demo-10/doc/std/index.html?search=vec> WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240317_AiDc61_2EM,240317_AiDcM0_2EN> Waterfall diagram: ![image](https://github.com/rust-lang/rust/assets/1593513/39548f0c-7ad6-411b-abf8-f6668ff4da18) </details> Preview: <http://notriddle.com/rustdoc-html-demo-10/doc2/std/index.html?search=vec> WebPageTest Comparison with before branch on a sort of worst case (searching `vec`, winds up downloading most of the shards anyway): <https://www.webpagetest.org/video/compare.php?tests=240322_BiDcCH_13R,240322_AiDcJY_104> ![image](https://github.com/rust-lang/rust/assets/1593513/4be1f9ff-c3ff-4b96-8f5b-b264df2e662d) ## Description r? `@GuillaumeGomez` The descriptions are, on almost all crates[^1], the majority of the size of the search index, even though they aren't really used for searching. This makes it relatively easy to separate them into their own files. Additionally, this PR pulls out information about whether there's a description into a bitmap. This allows us to sort, truncate, *then* download. This PR also bumps us to ES8. Out of the browsers we support, all of them support async functions according to caniuse. https://caniuse.com/async-functions [^1]: <https://microsoft.github.io/windows-docs-rs/>, a crate with 44MiB of pure names and no descriptions for them, is an outlier and should not be counted. But this PR should improve it, by replacing a long line of empty strings with a compressed bitmap with a single Run section. Just not very much. ## Detailed sizes ```console $ cat test.sh set -ex cp ../search-index*.js search-index.js awk 'FNR==NR {a++;next} FNR<a-3' search-index.js{,} | awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}' | sed -E "s:\\\\':':g" > search-index.json jq -c '.t' search-index.json > t.json jq -c '.n' search-index.json > n.json jq -c '.q' search-index.json > q.json jq -c '.D' search-index.json > D.json jq -c '.e' search-index.json > e.json jq -c '.i' search-index.json > i.json jq -c '.f' search-index.json > f.json jq -c '.c' search-index.json > c.json jq -c '.p' search-index.json > p.json jq -c '.a' search-index.json > a.json du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json $ bash test.sh + cp ../search-index1.78.0.js search-index.js + awk 'FNR==NR {a++;next} FNR<a-3' search-index.js search-index.js + awk 'NR>1 {gsub(/\],\\$/,""); gsub(/^\["[^"]+",/,""); print} {next}' + sed -E 's:\\'\'':'\'':g' + jq -c .t search-index.json + jq -c .n search-index.json + jq -c .q search-index.json + jq -c .D search-index.json + jq -c .e search-index.json + jq -c .i search-index.json + jq -c .f search-index.json + jq -c .c search-index.json + jq -c .p search-index.json + jq -c .a search-index.json + du -hs t.json n.json q.json D.json e.json i.json f.json c.json p.json a.json 64K t.json 800K n.json 8.0K q.json 4.0K D.json 16K e.json 192K i.json 544K f.json 4.0K c.json 36K p.json 20K a.json ``` These are, roughly, the size of each section in the standard library (this tool actually excludes libtest, for parsing-json-with-awk reasons, but libtest is tiny so it's probably not important). t = item type, like "struct", "free fn", or "type alias". Since one byte is used for every item, this implies that there are approximately 64 thousand items in the standard library. n = name, and that's now the largest section of the search index with the descriptions removed from it q = parent *module* path, stored parallel to the items within D = the size of each description shard, stored as vlq hex numbers e = empty description bit flags, stored as a roaring bitmap i = parent *type* index as a link into `p`, stored as decimal json numbers; used only for associated types; might want to switch to vlq hex, since that's shorter, but that would be a separate pr f = function signature, stored as lists of lists that index into `p` c = deprecation flag, stored as a roaring bitmap p = parent *type*, stored separately and linked into from `i` and `f` a = alias, as [[key, value]] pairs ## Search performance http://notriddle.com/rustdoc-html-demo-11/perf-shard/index.html For example, in stm32f4: <table><thead><tr><th>before<th>after</tr></thead> <tbody><tr><td> ``` Testing T -> U ... in_args = 0, returned = 0, others = 200 wall time = 617 Testing T, U ... in_args = 0, returned = 0, others = 200 wall time = 198 Testing T -> T ... in_args = 0, returned = 0, others = 200 wall time = 282 Testing crc32 ... in_args = 0, returned = 0, others = 0 wall time = 426 Testing spi::pac ... in_args = 0, returned = 0, others = 0 wall time = 673 ``` </td><td> ``` Testing T -> U ... in_args = 0, returned = 0, others = 200 wall time = 716 Testing T, U ... in_args = 0, returned = 0, others = 200 wall time = 207 Testing T -> T ... in_args = 0, returned = 0, others = 200 wall time = 289 Testing crc32 ... in_args = 0, returned = 0, others = 0 wall time = 418 Testing spi::pac ... in_args = 0, returned = 0, others = 0 wall time = 687 ``` </td></tr><tr><td> ``` user: 005.345 s sys: 002.955 s wall: 006.899 s child_RSS_high: 583664 KiB group_mem_high: 557876 KiB ``` </td><td> ``` user: 004.652 s sys: 000.565 s wall: 003.865 s child_RSS_high: 538696 KiB group_mem_high: 511724 KiB ``` </td></tr> </table> This perf tester is janky and unscientific enough that the apparent differences might just be noise. If it's not an order of magnitude, it's probably not real. ## Future possibilities * Currently, results are not shown until the descriptions are downloaded. Theoretically, the description-less results could be shown. But actually doing that, and making sure it works properly, would require extra work (we have to be careful to avoid layout jumps). * More than just descriptions can be sharded this way. But we have to be careful to make sure the size wins are worth the round trips. Ideally, data that’s needed only for display should be sharded while data needed for search isn’t. * [Full text search](https://internals.rust-lang.org/t/full-text-search-for-rustdoc-and-doc-rs/20427) also needs this kind of infrastructure. A good implementation might store a compressed bloom filter in the search index, then download the full keyword in shards. But, we have to be careful not just of the amount readers have to download, but also of the amount that [publishers](https://gist.github.com/notriddle/c289e77f3ed469d1c0238d1d135d49e1) have to store.

rustdoc: clean up formatting

351890d

rustbot assigned GuillaumeGomez Mar 17, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Mar 17, 2024

This comment has been minimized.

Sign in to view

notriddle force-pushed the notriddle/search-desc branch from b731ed6 to 5b44bfd Compare March 17, 2024 05:07

rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels Mar 17, 2024

This comment has been minimized.

Sign in to view

GuillaumeGomez reviewed Mar 17, 2024

View reviewed changes

src/librustdoc/html/static/js/search.js Outdated Show resolved Hide resolved

Fix style errors

2e368bf

notriddle force-pushed the notriddle/search-desc branch from 4835b8e to 2e368bf Compare March 17, 2024 20:59

Use promise.all to load sorted results in parallel

e860b9c

GuillaumeGomez reviewed Mar 19, 2024

View reviewed changes

src/librustdoc/html/static/js/search.js Outdated Show resolved Hide resolved

GuillaumeGomez reviewed Mar 19, 2024

View reviewed changes

src/tools/rustdoc-js/tester.js Outdated Show resolved Hide resolved

notriddle force-pushed the notriddle/search-desc branch from d5332d0 to 28db4cc Compare March 22, 2024 00:57

notriddle marked this pull request as ready for review March 22, 2024 02:37