Merged
Conversation
Signed-off-by: Lorenz Bauer <[email protected]>
The most common use case of a Spec is to look up a type by its name. For this purpose we maintain a map[essentialName][]TypeID. This requires allocating a string for each named type, which causes a very large overhead when parsing BTF. In reality, only a very small number of the named types will ever be looked up. The intuition here is that a couple of structs in the kernel contain most of the interesting information, for example struct sk_buff. Move as much of the cost of looking up a type by name to the actual lookup. Instead of spending a lot of time constructing an index up front we only maintaing an index going from the hash of a name to a type ID. 1. We can compute the hash on a byte slice and therefore avoid allocating a string. 2. Storing the index as a (hash, id) tuple allows us to store it in a slice. Lookups are just a binary search into the index. 3. Hash collisions do not introduce additional complexity because types can already share the same name. At the same time the common case of a 1:1 mapping from name to type is fast. Signed-off-by: Lorenz Bauer <[email protected]>
dylandreimerink
approved these changes
May 2, 2025
Member
dylandreimerink
left a comment
There was a problem hiding this comment.
🚀 cool approach with the fuzzyStringIndex. Can't find anything wrong with this.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Most of the time in parsing vmlinux BTF is spent in constructing the essentialName -> TypeID index.
Replace the hash table with a "fuzzy" index, which doesn't require allocating strings. The trade-off is that lookups now become more expensive, but that is a fine trade-off to make.