bpf2go: ergonomic enums by mejedi · Pull Request #1636 · cilium/ebpf

mejedi · 2024-12-20T13:00:24Z

Constant names emitted for enum elements end up quite unwieldy. They are
prefixed with enum name because in C struct, union and enum names are in
a separate namespace, allowing for overlaps with identifiers, e.g:

enum E { E };

While such overlaps are possible in theory, people usually don't do
it. If a typicall C naming convention is followed, we get

enum something {
    SOMETHING_FOO, SOMETHING_BAR
};

generating <STEM>SomethingSOMETHING_FOO and <STEM>SomethingSOMETHING_BAR.

In addition to "safe" long names, generate shorter ones if the
respective name is not taken. <STEM>SOMETHING_FOO and
<STEM>SOMETHING_BAR are much nicer to work with.

bpf2go orders generated types by name to ensure the output is stable. Adjust data structures such that we can rely on implicit sorting in text/template, removing sortTypes alltogether. The added reservedNames hash is handy for rejecting "reserved" type names such as "<STEM>Specs", and for the upcoming "ergonomic" enum feature (generate short names for enum members if not yet taken). Signed-off-by: Nick Zavaritsky <[email protected]>

Constant names emitted for enum elements end up quite unwieldy. They are prefixed with enum name because in C struct, union and enum names are in a separate namespace, allowing for overlaps with identifiers, e.g: enum E { E }; While such overlaps are possible in *theory*, people usually don't do it. If a typicall C naming convention is followed, we get enum something { SOMETHING_FOO, SOMETHING_BAR }; generating <STEM>SomethingSOMETHING_FOO and <STEM>SomethingSOMETHING_BAR. In addition to "safe" long names, generate shorter ones if the respective name is not taken. <STEM>SOMETHING_FOO and <STEM>SOMETHING_BAR are much nicer to work with. Signed-off-by: Nick Zavaritsky <[email protected]>

ti-mo

While yes, this makes for shorter identifiers, it also makes it less deterministic who will get the prefix-less one in case of conflicts. The first-declared one? The one appearing first alphabetically? Is it random? This needs to be clearly defined.

I also kind of don't really see the problem, could you give a real-world example of an identifier that gets unwieldy because of this? There's a case to be made for anonymous enums, which I'm not sure we currently support; at least I couldn't find any examples.

I'm also not sure about adding GoFormatter.ShortEnumIdentifier, we already have EnumIdentifier that could probably be extended.

ti-mo · 2025-01-08T12:37:56Z

cmd/bpf2go/gen/output_test.go

 	qt.Assert(t, qt.StringContains(str, "ProgFoo1 *ebpf.Program `ebpf:\"prog_foo_1\"`"))
 }
+
+func TestEnums(t *testing.T) {


Would it be possible to write this test without the range over conflict true/false? I'm not a big fan of treating invariants like a list of test cases, because you inevitably end up sprinkling if <invariant> around anyway.

These are technically separate tests; perhaps you could extract the series of qt.Assert calls into a test() closure within the function to avoid repetition.

ti-mo · 2025-01-08T12:40:16Z

cmd/bpf2go/gen/output_test.go

+	}
+}
+
+func wsSeparated(terms ...string) *regexp.Regexp {


Is this strictly required? IMO it's harder to read wsSeparated("barEnumNameV1", "barEnumName", "=", "1") than simply "barEnumNameV1 barEnumName = 1". Or is the amount of whitespace not deterministic?

is the amount of whitespace not deterministic?

The amount of whitespace depends on the length of other identifiers in the set since formatter inserts extra spaces for alignment. It doesn't improve readability indeed, but makes it easier to update test code in the future.

ti-mo · 2025-01-08T12:59:46Z

cmd/bpf2go/gen/output.go

-	types, err := sortTypes(typeNames)
-	if err != nil {
-		return err
+	typeByName := map[string]btf.Type{}


Use make() for consistency?

cmd/bpf2go/gen/output.go

ti-mo · 2025-01-08T13:53:23Z

cmd/bpf2go/gen/output.go

 	var buf bytes.Buffer
 	if err := commonTemplate.Execute(&buf, &ctx); err != nil {
-		return fmt.Errorf("can't generate types: %s", err)
+		return fmt.Errorf("can't generate types: %v", err)


Why %v over %s? This change feels a little arbitrary.

I actually meant %w.

ti-mo · 2025-01-08T16:11:37Z

cmd/bpf2go/gen/output.go

-		Names:      typeNames,
+		Names:      nameByType,
 		Identifier: args.Identifier,
+		ShortEnumIdentifier: func(_, element string) string {


Could this be made a part of EnumIdentifier instead? Not sure why this deserves its own hook, it's just another step during enum identifier generation.

Not sure why this deserves its own hook, it's just another step during enum identifier generation.

I propose generating BOTH long and short identifiers to maintain backwards compatibility.

cmd/bpf2go/gen/output_test.go

ti-mo · 2025-01-08T16:33:42Z

cmd/bpf2go/gen/output.go

-		// NB: This also deduplicates types.
-		typeNames[typ] = args.Stem + args.Identifier(typ.TypeName())
+	tn := templateName(args.Stem)
+	reservedNames := map[string]struct{}{


At first sight, I'd model this as a property to be specified during construction of the GoFormatter. Everytime the formatter generates an identifier (struct or otherwise), it can check uniqueness and push it to the set of seen names. I think it's a good feature to have in the formatter to prevent it from emitting invalid code, it doesn't need to be b2g-specific.

This way, you also avoid having to close over reservedNames and typeByName to get it into an ShortEnumIdentifier function.

mejedi · 2025-01-08T19:35:04Z

@ti-mo

I also kind of don't really see the problem, could you give a real-world example of an identifier that gets unwieldy because of this?

enum reassm_rc {
  REASSM_RC_SUCCESS,
  REASSM_RC_INTERNAL_ERROR,
  REASSM_RC_TOO_MANY_REASSEMBLIES,
  ...
};

yields

const (
    frReassmRcREASSM_RC_SUCCESS               frReassmRc = 0
    frReassmRcREASSM_RC_INTERNAL_ERROR        frReassmRc = 1
    frReassmRcREASSM_RC_TOO_MANY_REASSEMBLIES frReassmRc = 2
    ....

This code is meant to be reused across several projects, therefore I can't simply have SUCCESS, INTERNAL_ERROR and TOO_MANY_REASSEMBLIES.

mejedi · 2025-01-09T10:33:49Z

@ti-mo

it also makes it less deterministic who will get the prefix-less one in case of conflicts. The first-declared one? The one appearing first alphabetically? Is it random? This needs to be clearly defined.

Let's talk about conflicts. Assuming a C compiler, the only conflict possible is with a type name. E.g.

struct foo {};
enum E {foo};

Enum item name MUST be unique. The following program fails to compile:

enum A {foo};
enum B {foo}; // error: redefinition of enumerator 'foo'

In case of conflict, a type name wins.

In theory, the input object file could have duplicate enum identifiers. It is possible with non-C languages or multiple C compilation units linked together with bpftool. It is a niche use case as of today.

Types are visited in alphabetical order, items in a enum are visited in BTF order which typically matches declaration order.

ti-mo · 2025-02-25T14:11:49Z

@mejedi Thanks for your patience! As we've discussed offline, I'm reluctant to accept the patch as-is since it introduces subtle behaviour that makes things less deterministic and could be surprising for the user. In general, we want to keep changes minimally invasive.

I suggest we head towards a model where you can introduce all the hooks you need to implement niche logic like this in in-house tooling. The GoFormatter already has a few of these points, and you're introducing ShortEnumIdentifier here, so I'm wondering if you could submit another proposal that allows you to implement and maintain the logic you want on your side. bpf2go is also similarly (but not completely) modular to allow consuming it as a library, but I have no idea what's missing there.

Please start a discussion in #ebpf-go-dev to get this moving, that will get a bit more eyes on it than just reviewers.

mejedi requested a review from dylandreimerink as a code owner December 20, 2024 13:00

mejedi force-pushed the ergonomic-enums branch from 0c3db6f to 345d337 Compare December 20, 2024 13:01

mejedi changed the title ~~Ergonomic enums~~ bpf2go: Ergonomic enums Dec 20, 2024

mejedi mentioned this pull request Dec 20, 2024

bpf2go: generate assignment structs and Go types for Variables and VariableSpecs #1610

Merged

mejedi requested a review from ti-mo December 20, 2024 13:05

mejedi force-pushed the ergonomic-enums branch from 345d337 to ad7e33b Compare December 20, 2024 13:12

mejedi force-pushed the ergonomic-enums branch from ad7e33b to d3bc971 Compare December 20, 2024 13:15

mejedi force-pushed the ergonomic-enums branch from d3bc971 to 12b600e Compare December 20, 2024 13:22

mejedi changed the title ~~bpf2go: Ergonomic enums~~ bpf2go: ergonomic enums Dec 20, 2024

ti-mo requested changes Jan 8, 2025

View reviewed changes

ti-mo closed this Feb 25, 2025

Comments

Conversation

mejedi commented Dec 20, 2024

Uh oh!

ti-mo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mejedi Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mejedi commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mejedi commented Jan 9, 2025

Uh oh!

ti-mo commented Feb 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mejedi Jan 8, 2025 •

edited

Loading

mejedi commented Jan 8, 2025 •

edited

Loading