-
Notifications
You must be signed in to change notification settings - Fork 3k
[feat] optimize aiTokenLimiterPlugin for streaming tokens #6055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the AI token limiter plugin to better handle streaming tokens by implementing real-time token counting during streaming responses instead of waiting for the complete response. The changes add support for Server-Sent Events (SSE) parsing and include usage information in streaming requests.
- Enhanced token counting for streaming AI responses with real-time processing
- Added support for
writeAndFlushWithmethod to handle streaming data properly - Modified OpenAI configuration to include usage information in streaming requests
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| AiTokenLimiterPlugin.java | Implements streaming token counting with SSE parsing and real-time usage tracking |
| OpenAI.java | Adds stream_options configuration to include usage data in streaming requests |
| Constants.java | Defines new constants for streaming options and usage inclusion |
Comments suppressed due to low confidence (1)
shenyu-plugin/shenyu-plugin-ai/shenyu-plugin-ai-token-limiter/src/main/java/org/apache/shenyu/plugin/ai/token/limiter/AiTokenLimiterPlugin.java:310
- [nitpick] The variable name 'last' is ambiguous. Consider renaming it to 'lastCompletionTokens' or 'finalTokenCount' to clarify its purpose.
long last = 0L;
| int offset = 0; | ||
| int len = inBytes.length; | ||
| if (!headerSkipped.get()) { | ||
| if (!headerSkipped.getAndSet(true)) { |
Copilot
AI
Jul 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using getAndSet(true) in a conditional check can lead to race conditions. Consider using a more explicit atomic operation or synchronization mechanism to ensure thread safety.
| if (!headerSkipped.getAndSet(true)) { | |
| if (headerSkipped.compareAndSet(false, true)) { |
| } catch (Exception e) { | ||
| LOG.error("parse ai resp error", e); | ||
| } | ||
| } | ||
| writer.write(ByteBuffer.wrap(processedBytes)); | ||
| }); | ||
| } catch (Exception e) { | ||
| LOG.error("read dataBuffer error", e); |
Copilot
AI
Jul 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching generic Exception is too broad. Consider catching specific exceptions like JsonProcessingException or IOException to handle different error scenarios appropriately.
| } catch (Exception e) { | |
| LOG.error("parse ai resp error", e); | |
| } | |
| } | |
| writer.write(ByteBuffer.wrap(processedBytes)); | |
| }); | |
| } catch (Exception e) { | |
| LOG.error("read dataBuffer error", e); | |
| } catch (JsonProcessingException e) { | |
| LOG.error("JSON processing error while parsing AI response", e); | |
| } catch (IOException e) { | |
| LOG.error("IO error while parsing AI response", e); | |
| } | |
| } | |
| writer.write(ByteBuffer.wrap(processedBytes)); | |
| }); | |
| } catch (IOException e) { | |
| LOG.error("IO error while reading dataBuffer", e); |
| private long extractUsageTokensFromSse(final String sse) { | ||
| Pattern p = Pattern.compile("\"completion_tokens\"\\s*:\\s*(\\d+)"); | ||
| Matcher m = p.matcher(sse); |
Copilot
AI
Jul 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern is compiled on every method call. Consider making the Pattern a static final field to improve performance.
| private long extractUsageTokensFromSse(final String sse) { | |
| Pattern p = Pattern.compile("\"completion_tokens\"\\s*:\\s*(\\d+)"); | |
| Matcher m = p.matcher(sse); | |
| private static final Pattern COMPLETION_TOKENS_PATTERN = Pattern.compile("\"completion_tokens\"\\s*:\\s*(\\d+)"); | |
| private long extractUsageTokensFromSse(final String sse) { | |
| Matcher m = COMPLETION_TOKENS_PATTERN.matcher(sse); |
| streamingUsageRecorded.set(true); | ||
| } | ||
| } catch (Exception e) { | ||
| LOG.error("parse ai resp error", e); |
Copilot
AI
Jul 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message 'parse ai resp error' is unclear and uses abbreviations. Consider a more descriptive message like 'Failed to parse AI response JSON payload'.
| LOG.error("parse ai resp error", e); | |
| LOG.error("Failed to parse AI response JSON payload", e); |
* fix: optimize aiTokenLimiterPlugin for streaming tokens * chore: java format * chore: code review by copilot
optimize aiTokenLimiterPlugin for streaming tokens
Make sure that:
./mvnw clean install -Dmaven.javadoc.skip=true.