Skip to content

Send ratelimit request to ratelimit service on response flow with hits_addend #29161

@PeterL328

Description

@PeterL328

Title: Send ratelimit request to ratelimit service on response flow with hits_addend for reporting purpose.

Description:
Context: We have a use case that involves internally using the OpenAI API and using Envoy to send the request to OpenAI servers. We want to rate-limit on the # token-consumed per each OpenAI API request on a user-level. The token consumed is calculated by the token used in request + response body. The total token consumed (request + response) will be sent as part of the response back from OpenAI API so we would like to take this token-consumed number and send to the ratelimit sidecar via the ratelimit filter (using the hits_addend field).

We think to achieve the above, three work items need to be completed:

  1. Update ratelimit filter to support sending hits_addend
  • I've started by landing this PR so ratelimit grpc client supports hits_addend.
  1. Update ratelimit filter so it can be configured to send request to sidecar on response flow.

cc: @sc0ttbeardsley @JuniorHsu @fishcakez

Any thoughts on the problem?
Do you think the proposed solution is a good approach or are there better ways to achieve?
[optional Relevant Links:]

Any extra documentation required to understand the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/ratelimitenhancementFeature requests. Not bugs or questions.stalestalebot believes this issue/PR has not been touched recently

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions