Add CO-RE support for kernel modules#1300
Conversation
|
It looks like some examples are failing because tracefs is not mounted. I could make |
lmb
left a comment
There was a problem hiding this comment.
Nice, I was worried that this would be a lot more complicated!
I still have my old questions re semantics and corner cases:
- As mentioned, do we have to deal with module reload / unload? It would invalidate the list of probeable functions for sure. It might change kmod BTF as well.
- Using the fn name to figure out the kmod seems a bit error prone.
- You propose that CO-RE for a probe attached to a kprobe in a kernel module will look at
set of vmlinux types+set of containing kmod types. What about a kprobe on a vmlinux function which needs access to kmod types? I'm thinking of data behindvoid *ctxand similar. (There is also a related case of probe on kmod which needs access to other kmod types, but that is probably rare.)
For the implementation, I was thinking that instead of introducing mergedSpec we'd operate on []*Spec. ProgramOptions.KernelTypes would become []*Spec as well. CORERelocate would iterate the []*Spec and try to find a valid set of relocations for each target. The logic to find the best possible match would have to be adjusted accordingly:
Lines 276 to 294 in a8be855
Since the type IDs of kernel modules conflict with each other, you are only ever going to be dealing with vmlinux plus one optional kmod BTF. libbpf makes this same assumption. |
With the changes you requested, you could accomplish this by setting |
I don't see a way to know if a module has been unloaded and then loaded with different contents, other than maybe comparing sizes from |
b02c8ed to
d01b1d5
Compare
lmb
left a comment
There was a problem hiding this comment.
Okay, went over the PR again. Here is what I would propose, let me know what you think.
- Skip
ProgramSpec.KernelModule. Instead,ProgramOptions.KernelTypesbecomes[]*btf.Specas discussed. IfKernelTypesis nil, the library tries to find the appropriate BTF to use, via means of parsing kallsyms / available_filter_functions and doing a lookup of ProgramSpec.AttachTo. Most of the time this will be just vmlinux. Some of the time this will be vmlinux + kmod via LoadKernelModuleSpec. - kallsyms / available_filter_functions: which one is more appropriate to use? I'd also prefer to not export
KernelModulefor now, since it ties us into an API. Probably best to stick this intointernal/kallsyms. Need to figure out whether this needs to be cached, and if yes how to invalidate the cache when the set of loaded modules changes. Something inprocthat lists loaded modules maybe? - Merge
FlushKernelModuleCacheintobtf.FlushKernelSpec(). There should be a single function to clean up cached info, and in a way this is all related to btf stuff anyways. We could rename the function if need be. - CORERelocate will work against
[]*Specas discussed. I think it makes sense to reject target type ID lookups for kmods as long as there is no canonical way to refer to these IDs from the kernel side. This might need some finessing in CORERelocate, I can probably take a look at that once the first implementation is in.
lmb
left a comment
There was a problem hiding this comment.
Okay, went over the PR again. Here is what I would propose, let me know what you think.
- Skip
ProgramSpec.KernelModule. Instead,ProgramOptions.KernelTypesbecomes[]*btf.Specas discussed. IfKernelTypesis nil, the library tries to find the appropriate BTF to use, via means of parsing kallsyms / available_filter_functions and doing a lookup of ProgramSpec.AttachTo. Most of the time this will be just vmlinux. Some of the time this will be vmlinux + kmod via LoadKernelModuleSpec. - kallsyms / available_filter_functions: which one is more appropriate to use? I'd also prefer to not export
KernelModulefor now, since it ties us into an API. Probably best to stick this intointernal/kallsyms. Need to figure out whether this needs to be cached, and if yes how to invalidate the cache when the set of loaded modules changes. Something inprocthat lists loaded modules maybe? - Merge
FlushKernelModuleCacheintobtf.FlushKernelSpec(). There should be a single function to clean up cached info, and in a way this is all related to btf stuff anyways. We could rename the function if need be. - CORERelocate will work against
[]*Specas discussed. I think it makes sense to reject target type ID lookups for kmods as long as there is no canonical way to refer to these IDs from the kernel side. This might need some finessing in CORERelocate, I can probably take a look at that once the first implementation is in.
It was hard to find a conclusive comparison between the two based on the kernel source, but |
In the case of btfhub-supported kernels, a user will need a way to know if/which module a function lies within. That way we can lookup the appropriate BTF to pass into the Since the library would already have this functionality, why not export it? Otherwise every user of btfhub, kmods, and |
|
This moves the loading of kernel/kmod BTF up from |
|
I made some changes that are close to what you want, but had to fill in some gaps. I would like to see |
494679f to
a20efa4
Compare
|
@lmb can you take another look? |
|
Ugh. Since we need different One idea would be
Another idea is a callback function |
59a51a1 to
5a8c94d
Compare
eiffel-fl
left a comment
There was a problem hiding this comment.
Hi!
Thank you for this contribution!
We are interested by using it in Inspektor Gadget.
So far, I only have one comment but I will taker another look later to get a broader understanding.
Best regards.
lmb
left a comment
There was a problem hiding this comment.
Going back to semantics: your current implementation means that a type defined in a kernel module can "shadow" a vmlinux type if it's better according to our scoring. I'm on the fence whether it'd be better to always prefer vmlinux if there is a matching target, no matter how good a kmod type is. Do you have an opinion on that?
btf/core.go
Outdated
|
|
||
| targets := target.namedTypes[newEssentialName(localTypeName)] | ||
| fixups, err := coreCalculateFixups(group.relos, target, targets, bo) | ||
| fixups, err := coreCalculateFixups(group.relos, &mergeTarget, mergeTarget.NamedTypesIterate(newEssentialName(localTypeName)), bo) |
There was a problem hiding this comment.
Merging specs up front is going to be more complicated with lazy copy going in. How about the following pseudo go:
for localType, group := range relosByType {
var bestFixup
for _, target := range targets {
fixup, err := coreCalculateFixup(...)
if fixup > bestFixup {
bestFixup = fixup
// other checks
}
}
if bestFixup == nil {
// generate poison all
}
}- Instead of merging the specs, we add a loop here which iterates over
targetsand tries to find a valid fixup. - We factor out the code to do scoring and call if from here in addition to coreCalculateFixups
Lines 276 to 294 in f95957d
- We move generating the poison all fixup to after the loop over targets in this function:
Lines 298 to 309 in f95957d
There was a problem hiding this comment.
I'm not actually merging them, just storing pointers to each Spec and iterating through them with wrapper functions. I updated to handle your changes in main. Let me know what you think.
There was a problem hiding this comment.
Okay, works for me. I'm not a big fan of mergedSpec, it seems too special purpose for me. Can you move it to core.go so that it's close to where it'll be used?
Ah, that is annoying :( Why do you want to avoid multiple NewCollection calls?
That could work. I'd probably use
This is because you don't want to load all kmod by default? What if you only parsed kmod for all loaded modules instead?
Hmm, I don't follow, sorry. We could still stick the map in ProgramOptions I think? Not sure how we would make use of it when loading
I'm a bit wary of callbacks. If we add this it begs the question whether we'd use this for vmlinux as well? Does it deprecate |
We have a bunch of eBPF programs in the same object file. That is currently a single collection. Anything that needs kernel modules would need to be extracted out and dealt with separately, probably as individual program loads. That makes shared map handling way more complicated. It isn't impossible, but is a smell about the API maybe being wrong. |
If you mean modules already loaded in the kernel, that can still be 100 or more. That is a lot of CPU and memory used for probably no reason. Otherwise, we have to loop through the kprobe functions, determine which modules are needed, load those BTF. The library when loading each kprobe would also need to do that lookup of function to kmod, to load the correct BTF from the map. |
We could put it in |
How would that happen? kmod BTF is generated using the vmlinux BTF so that it doesn't include anything already in vmlinux BTF. |
|
@lmb updated using a map for the module types. |
|
Sorry for the long radio silence. I've been busy hashing out cilium/cilium work.
I don't see the problem to be honest.
By using distinct types with the same name in multiple compilation units. Try grepping for Which ring_buffer do we prefer? The kmod one? Or the kernel? Which one of the kernel ones?
Does that mean we can't do better? ;P
We can't figure out which modules we need up front, but we could do something like the following:
These are the correct semantics to me (rather than just vmlinux + kmod) but I understand that the immediate need is simpler. I still want to keep the door open for this later on though.
That's interesting! Can you show me an example or tell me how to reproduce that?
Thanks for giving your view point and giving me confirmation that I'm not completely off xD |
lmb
left a comment
There was a problem hiding this comment.
Looks fine, please make the changes to CORERelocate as we discussed. If targets is nil it can just return an error, which is what it was doing before.
I'm a bit concerned about the dead lock potential between kernel and kmod loading, I'd love there to only be a single lock. Oh well.
Finally, the bit I'm still missing is how you're going to use this with external kmod BTF?
- You don't want to parse all loaded modules, since that might be in the 100s and could still be too slow.
- There is no feedback mechanism for you to figure out which modules the lib wants.
I see two solutions: replace the map with the callback you proposed or expose programKernelModule on ProgramSpec. The latter seems fine, and would also allow dropping the export of ebpf.KernelModule. WDYT?
btf/core.go
Outdated
|
|
||
| targets := target.namedTypes[newEssentialName(localTypeName)] | ||
| fixups, err := coreCalculateFixups(group.relos, target, targets, bo) | ||
| fixups, err := coreCalculateFixups(group.relos, &mergeTarget, mergeTarget.NamedTypesIterate(newEssentialName(localTypeName)), bo) |
There was a problem hiding this comment.
Okay, works for me. I'm not a big fan of mergedSpec, it seems too special purpose for me. Can you move it to core.go so that it's close to where it'll be used?
You can see how I've implemented it at the moment: DataDog/ebpf-manager@8b77c9a |
This is not what it was doing. It tries to auto-load the kernel spec: Lines 173 to 179 in b3432fb |
I meant this: Lines 169 to 170 in 4267cbc I'm looking at merging this but realised that you merged main into your branch. Can you please undo that and rebase + squash on top of main? |
fd2e4dd to
ce7b452
Compare
Done. Let me know if there are any other changes you'd like me to make. |
Signed-off-by: Bryce Kahle <[email protected]>
ce7b452 to
5fd9ae5
Compare
Signed-off-by: Lorenz Bauer <[email protected]>
No changes to the source code. Signed-off-by: Lorenz Bauer <[email protected]>
Signed-off-by: Lorenz Bauer <[email protected]>
This allows dropping the fallback parameter and allows further refactoring. Signed-off-by: Lorenz Bauer <[email protected]>
Kernel and module BTF are currently kept in two separate global variables, with separate locking. This creates a deadlock hazard. Refactor the code to use a single lock, and get rid of a lot of indirection which is not necessary anymore because Spec.Copy is cheap. Signed-off-by: Lorenz Bauer <[email protected]>
The function is already invoked via btf.FlushKernelSpec(). Signed-off-by: Lorenz Bauer <[email protected]>
Signed-off-by: Lorenz Bauer <[email protected]>
5ad459e to
215a22a
Compare
lmb
left a comment
There was a problem hiding this comment.
@brycekahle I ended up making the changes myself.
- Don't export FlushKallsysm separately.
- Don't pass kmodName to CORERelocate.
- Remove the deadlock risk from having two locks protecting vmlinux and modules
I feel like these came up during the review before, but I'm not sure what happened.
Sorry it took so long to get this over the finish line!
|
Going to merge this tomorrow morning UK time. |
eiffel-fl
left a comment
There was a problem hiding this comment.
Hi!
Thank you for the initial work and the polish!
I do not have a broad knowledge about it but I have one question with regard to locking and found a nit.
Best regards.
| kernelBTF.Lock() | ||
| defer kernelBTF.Unlock() |
There was a problem hiding this comment.
I am not sure to understand the locking here.
Particularly, I would just hold the write lock while writing kernelBTF.kernel, but I surely miss something here. Can you please shed some light?
There was a problem hiding this comment.
Yeah, the code is a bit hard to follow. Reason we do it this way is that dropping the read lock and acquiring the write lock are not atomic. Another goroutine might have preempted us and populated kernelBTF.kernel.
| base, err := LoadKernelSpec() | ||
| if err != nil { | ||
| return nil, fmt.Errorf("load kernel spec: %w", err) | ||
| } |
There was a problem hiding this comment.
This snippet can be moved right before you need it, i.e. line 86.
There was a problem hiding this comment.
That would cause a deadlock: we already hold kernelBTF.Lock at that point, so acquiring it in LoadKernelSpec will block indefinitely.
re #705
KernelModule() (string, error)toProgramSpec. This function attempts to determine the kernel module for kprobe/fentry programs, by parsing/proc/kallsyms.KernelModuleTypes map[string]*btf.SpectoProgramOptions, so users can provide their own kmod BTF.The usage of a
*Specin parts of CO-RE had to be abstracted to avoid copying/merging of[]Typeslices.This does work (tested with
nf_conntrackkmod and__nf_conntrack_hash_insertkprobe), but I imagine you have thoughts about how to adapt this approach.