feat: switch to sidecar+UDS extproc by mathetake · Pull Request #629 · envoyproxy/ai-gateway

mathetake · 2025-05-21T00:49:42Z

Commit Message

This commit refactors the internal on how the ext proc is deployed. Specifically, this switches to insert the ext proc container as a sidecar container of Envoy pods created by Envoy Gateway. This is another large refactoring that turned out necessary for #599. This utilizes the mutating webhook to insert the extproc container Envoy pods.

Making the extproc as as sidecar means that we now have a one-to-one mapping between Gateway and the extproc hence this naturally resolves the previously known limitation #509 and now users can attach multiple AIGatewayRoute(s) to one Gateway.

Implementation note: since the volume mounts only work in the namespace-scoped way, use-created secrets (like API Keys) cannot be mounted by the extproc as it runs in "envoy-gateway-system" namespace. To resolve this, now the controller reads the secret and embed the read credentials into the "extproc secret" (which is previously known as "extproc configmap") together with routing, matching and backend information. That secret is written in the "envoy-gateway-system" namespace hence it can be mounted by the extproc container.

Related Issues/PRs (if applicable)

Resolves #509
Resolves #621

**Commit Message** This removes the inference extension related code from the controller to reduce the size of the refactoring PR #629. We need to do the complete redo on inference extension after #629, so this doesn't mean that we drop the support for it. --------- Signed-off-by: Takeshi Yoneda <[email protected]>

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2025-05-21T23:38:34Z

ok all good except that i need to backfill the unit tests on the new gateway reconciler and gateway pod mutators. other than that, ready

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2025-05-22T03:43:27Z

api/v1alpha1/api.go

+	// Note: when multiple AIGatewayRoute resources are attached to the same Gateway, and each
+	// AIGatewayRoute has a different resource configuration, the ai-gateway will pick one of them
+	// to configure the resource requirements of the external processor container.
+	//


i want some ideas on this - maybe we might want a "gateway-level" CRD to define things like this

I think we can add the resource requirement to the ai gateway controller flags.

manifests/charts/ai-gateway-helm/templates/admission_webhook.yaml

mathetake · 2025-05-22T17:31:04Z

i think this lacks lots of code comments compared to the usual changes i did ...

mathetake · 2025-05-22T17:46:03Z

i think i need to add one additional end to end test where we apply multiple AIGatewayRoutes

Signed-off-by: Takeshi Yoneda <[email protected]>

aabchoo

minor comments still reviewing

Makefile

site/docs/api/api.mdx

Signed-off-by: Takeshi Yoneda <[email protected]>

manifests/charts/ai-gateway-helm/templates/admission_webhook.yaml

mathetake · 2025-05-23T00:56:51Z

i will refactor around how certs are created etc tomorrow, following a nice advice by @arkodg

aabchoo

re:CMD

cmd/aigw/translate.go

aabchoo · 2025-05-23T14:59:06Z

cmd/controller/main.go


 	var slogLevel slog.Level
-	if err = slogLevel.UnmarshalText([]byte(*extProcLogLevelPtr)); err != nil {
+	if err := slogLevel.UnmarshalText([]byte(*extProcLogLevelPtr)); err != nil {


curious: why use := when err is defined a few lines above?

cmd/extproc/mainlib/main.go

internal/controller/ai_gateway_route.go

internal/controller/ai_service_backend.go

mathetake · 2025-05-23T19:07:05Z

internal/extensionserver/extensionserver.go

 // PostVirtualHostModify allows an extension to modify the virtual hosts in the xDS config.
-//
-// Currently, this replaces the route that has "x-ai-eg-selected-route" pointing to "original_destination_cluster" to route to the original destination cluster.
-func (s *Server) PostVirtualHostModify(_ context.Context, req *egextension.PostVirtualHostModifyRequest) (*egextension.PostVirtualHostModifyResponse, error) {


i should've left note: this code here was not used at all after InfExt controller code removal #632, and I didn't want to fix the test for this unused code hence I removed

cc @aabchoo

mathetake · 2025-05-23T19:09:47Z

internal/extensionserver/extensionserver.go

-		//     httpHeaderName: x-ai-eg-original-dst
-		//     useHttpHeader: true
-		//   type: ORIGINAL_DST
+	if !extProcUDSExist {


note: this uds cluster creation was necessary to attach the UDS cluster to the upstream filter below since now the extension policy was created per-gateway, not per-route, so we needed to create a unique name UDS cluster that can be attached below

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2025-05-27T15:32:55Z

ok it seems e2e is flaky... i need to fix

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2025-05-27T16:17:49Z

hmm still flaky...

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2025-05-27T16:24:50Z

@yuzisun @wengyao04 PTAL

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2025-05-27T16:58:16Z

man flake

Signed-off-by: Takeshi Yoneda <[email protected]>

# Conflicts: # cmd/aigw/run.go

# Conflicts: # internal/controller/ai_gateway_route.go # internal/controller/ai_gateway_route_test.go

Signed-off-by: Takeshi Yoneda <[email protected]>

This was referenced May 21, 2025

feat: add ownedBy and createdAt fields to AIGatewayRoute #620

Merged

controller: removes InfExt controllers #632

Merged

mathetake added 2 commits May 21, 2025 16:13

feat: switch to sidecar+UDS extproc

e5fc81e

Signed-off-by: Takeshi Yoneda <[email protected]>

feat: switch to sidecar+UDS extproc

f973f65

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake force-pushed the sidecarstyle branch from f5ee329 to f973f65 Compare May 21, 2025 23:15

TODO

4bd6630

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake added 3 commits May 21, 2025 19:49

done unit tests

e11b39e

Signed-off-by: Takeshi Yoneda <[email protected]>

done

536aa20

Signed-off-by: Takeshi Yoneda <[email protected]>

done

de73c17

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake marked this pull request as ready for review May 22, 2025 03:39

mathetake requested a review from a team as a code owner May 22, 2025 03:39

mathetake commented May 22, 2025

View reviewed changes

manifests/charts/ai-gateway-helm/templates/admission_webhook.yaml Outdated Show resolved Hide resolved

mathetake added 4 commits May 22, 2025 10:52

done

73dcd37

Signed-off-by: Takeshi Yoneda <[email protected]>

merge

dde920c

Signed-off-by: Takeshi Yoneda <[email protected]>

de-flake

529d07d

Signed-off-by: Takeshi Yoneda <[email protected]>

more

7158e46

Signed-off-by: Takeshi Yoneda <[email protected]>

aabchoo reviewed May 22, 2025

View reviewed changes

Makefile Show resolved Hide resolved

site/docs/api/api.mdx Outdated Show resolved Hide resolved

doc fix

fad5a7b

Signed-off-by: Takeshi Yoneda <[email protected]>

yuzisun reviewed May 22, 2025

View reviewed changes

manifests/charts/ai-gateway-helm/templates/admission_webhook.yaml Show resolved Hide resolved

aabchoo reviewed May 23, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into sidecarstyle

e967a40

aabchoo reviewed May 23, 2025

View reviewed changes

internal/controller/ai_gateway_route.go Outdated Show resolved Hide resolved

internal/controller/ai_gateway_route.go Outdated Show resolved Hide resolved

internal/controller/ai_service_backend.go Outdated Show resolved Hide resolved

mathetake commented May 23, 2025

View reviewed changes

mathetake added 2 commits May 27, 2025 08:13

review: nil check on extproc config

334705a

Signed-off-by: Takeshi Yoneda <[email protected]>

remove stale sockets

158eafe

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake added 3 commits May 27, 2025 08:33

more

c2baec0

Signed-off-by: Takeshi Yoneda <[email protected]>

less flaky

1c06ca8

Signed-off-by: Takeshi Yoneda <[email protected]>

more

d38b910

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake added 2 commits May 27, 2025 09:20

deflake tests

d85ba87

Signed-off-by: Takeshi Yoneda <[email protected]>

deflake tests

ba5a18a

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake mentioned this pull request May 27, 2025

Support DynamicLoadBalancing beyond AIE(API inference extension) #604

Closed

mathetake requested review from wengyao04 and yuzisun May 27, 2025 16:24

deflake tests

52f0adc

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake added 9 commits May 27, 2025 10:03

eject

6f75747

Signed-off-by: Takeshi Yoneda <[email protected]>

dedupe multiple costs

f7d647a

Signed-off-by: Takeshi Yoneda <[email protected]>

dedupe multiple costs

064199d

Signed-off-by: Takeshi Yoneda <[email protected]>

fix unit tests

5822012

Signed-off-by: Takeshi Yoneda <[email protected]>

Merge remote-tracking branch 'origin/main' into sidecarstyle

f877322

# Conflicts: # cmd/aigw/run.go

Merge remote-tracking branch 'origin/main' into sidecarstyle

64fd360

# Conflicts: # internal/controller/ai_gateway_route.go # internal/controller/ai_gateway_route_test.go

deflake tests

2ec84f8

Signed-off-by: Takeshi Yoneda <[email protected]>

deflake tests

053b802

Signed-off-by: Takeshi Yoneda <[email protected]>

deflake tests

392239e

Signed-off-by: Takeshi Yoneda <[email protected]>

yuzisun approved these changes May 28, 2025

View reviewed changes

yuzisun merged commit 8d9c8e0 into main May 28, 2025
17 checks passed

yuzisun deleted the sidecarstyle branch May 28, 2025 17:19

mathetake mentioned this pull request Jun 23, 2025

Gateway Controller not reconciled on AIGatewayRoute delete #728

Closed

This was referenced Jul 7, 2025

EnvoyExtensionPolicy doesn't get deleted after removing CRDs #836

Closed

Allow users to provide template (schema) for external processor #544

Closed

Conversation

mathetake commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathetake commented May 21, 2025

Uh oh!

mathetake May 22, 2025

Choose a reason for hiding this comment

Uh oh!

yuzisun May 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mathetake commented May 22, 2025

Uh oh!

mathetake commented May 22, 2025

Uh oh!

aabchoo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathetake commented May 23, 2025

Uh oh!

aabchoo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aabchoo May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathetake May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mathetake May 23, 2025

Choose a reason for hiding this comment

Uh oh!

mathetake May 23, 2025

Choose a reason for hiding this comment

Uh oh!

mathetake commented May 27, 2025

Uh oh!

mathetake commented May 27, 2025

Uh oh!

mathetake commented May 27, 2025

Uh oh!

mathetake commented May 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mathetake commented May 21, 2025 •

edited

Loading

mathetake May 23, 2025 •

edited

Loading