Conversation
Signed-off-by: Takeshi Yoneda <[email protected]>
60b9003 to
86ab274
Compare
|
the implementation is mechanical so nothing special here. I hope someone approves by the time when I wake up tomorrow then I can unblock the implementation part of #43 as well as the rate limit stuff |
Signed-off-by: Takeshi Yoneda <[email protected]>
| // schema: OpenAI | ||
| // backendRoutingHeaderKey: x-backend-name | ||
| // modelNameHeaderKey: x-model-name | ||
| // selectedBackendHeaderKey: x-envoy-ai-gateway-selected-backend |
There was a problem hiding this comment.
not sure if we want to call it selected-backend, the backend may not be necessarily selected by gateway. User can still specify the backend in the request
There was a problem hiding this comment.
well, the backend will be "calculated" and "selected" following the matching rule of LLMRoute, so the word "selected" makes sense regardless of who actually "select".
There was a problem hiding this comment.
this is the internal use only, and not user facing, so it should be fine
There was a problem hiding this comment.
This is not true, you might not be able to select the backend just based on the model.. For example there is case where the same anthropic models are supported by both google and aws. In this case user needs to set the backend header to determine where to route to.
see https://aws.amazon.com/bedrock/claude/ and https://cloud.google.com/solutions/anthropic?hl=en.
There was a problem hiding this comment.
wait a second, this is nothing to do with model...
There was a problem hiding this comment.
Say you havve
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: LLMRoute
metadata:
name: some-route
namespace: default
spec:
inputSchema:
schema: OpenAI
rules:
- matches:
- headers:
- type: Exact
name: some-random-header-that-can-be-sent-directly-by-clients
value: foo
backendRefs:
- name: some-backend-name
......and clients send curl -H "some-random-header-that-can-be-sent-directly-by-clients: foo" then internally extproc sets the header $selectedBackendHeaderKey: some-backend-name. That's what this does and this is the completely implementation detail. This package is not CRD but a configuration of extproc itself.
There was a problem hiding this comment.
So you are saying we are not providing the standard ai gateway backend routing header to user, they MUST define the matching rules on LLMRoute for each backend user creates. Is that correct ?
There was a problem hiding this comment.
yes exactly currently, though we can provide some "canonical-backend-choosing-header-key" defined in the api/v1alpha package and prioritize the header value when present, which effectively ignores LLMRoute.Rules if the header exists. This is another topic we should discuss in another issue/pr if you want to support that. This configuration key is complete implementation detail regardless of whether or not we do that
There was a problem hiding this comment.
I will raise an issue tomorrow!
There was a problem hiding this comment.
That is what exactly I was thinking to create those rules for user, if this routing header exists then we ignore the rules on llm route. I think this can provide better user experience.
extprocconfig/extprocconfig.go
Outdated
| // where the input of the external processor is in the OpenAI schema, the model name is populated in the header x-model-name, | ||
| // The model name header `x-model-name` is used in the header matching to make the routing decision. **After** the routing decision is made, | ||
| // the selected backend name is populated in the header `x-backend-name`. For example, when the model name is `llama3.3333`, | ||
| // where the input of the external processor is in the OpenAI schema, the model name is populated in the header x-envoy-ai-gateway-llm-model, |
There was a problem hiding this comment.
Since user can configure the model routing header name, we can say configured modelNameHeaderKey
There was a problem hiding this comment.
well, no, users do not choose the model routing header name at this point.
ai-gateway/api/v1alpha1/api.go
Line 206 in cb6b2e0
It should be good to make this part of the LLMRoute resource, but the comment here matches what it is now
Signed-off-by: Takeshi Yoneda <[email protected]>
Signed-off-by: Takeshi Yoneda <[email protected]>
| return fmt.Errorf("failed to get extension policy: %w", err) | ||
| } | ||
| pm := egv1a1.BufferedExtProcBodyProcessingMode | ||
| port := gwapiv1.PortNumber(1063) |
There was a problem hiding this comment.
Why specifically 1063? should this port be configurable?
There was a problem hiding this comment.
yeah we can make this configurable later
Signed-off-by: Takeshi Yoneda <[email protected]>
This comes up in a thread in #71 with @yuzisun, and we might want to remove the LLM prefix. The rationale is that the current functionality is nothing to do with "LLM" but more about the general routing and authn/z with "AI providers" where the input is OpenAI format. On the other hand, there "will be" the LLM specific CRD such as the configurations for prompt guard, semantics caching etc. --------- Signed-off-by: Takeshi Yoneda <[email protected]>
This commit adds the initial implementation of the resource
management controllers. As per the official recommendation of
controller-runtime, the controller interface is implemented per
CRD. Therefore, this adds two typical "controller" one for LLMRoute
and another for LLMBackend.
In addition, this implements "configuration sink" which is the sync
mechanism across multiple resource events via Go channel. It has
the global view over all AI Gateway resources and hence have the
full context required to create a final extproc configuration as well
as HTTPRoute. The following is the (very) simple diagram for the
relationship among k8s events, controllers and configSink.
k8s events --async--> {llmRouteCtrl, llmBackendCtrl, ...} --sinkEvent (sync)--> configSink
The followup PR will add the end to end tests.