-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Send ratelimit request to ratelimit service on response flow with hits_addend #29161
Description
Title: Send ratelimit request to ratelimit service on response flow with hits_addend for reporting purpose.
Description:
Context: We have a use case that involves internally using the OpenAI API and using Envoy to send the request to OpenAI servers. We want to rate-limit on the # token-consumed per each OpenAI API request on a user-level. The token consumed is calculated by the token used in request + response body. The total token consumed (request + response) will be sent as part of the response back from OpenAI API so we would like to take this token-consumed number and send to the ratelimit sidecar via the ratelimit filter (using the hits_addend field).
We think to achieve the above, three work items need to be completed:
- Update ratelimit filter to support sending hits_addend
- I've started by landing this PR so ratelimit grpc client supports hits_addend.
- Update ratelimit filter so it can be configured to send request to sidecar on response flow.
cc: @sc0ttbeardsley @JuniorHsu @fishcakez
Any thoughts on the problem?
Do you think the proposed solution is a good approach or are there better ways to achieve?
[optional Relevant Links:]
Any extra documentation required to understand the issue.