Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds priority-based truncation support to the POML writer system, enabling intelligent content reduction when concatenating markdown boxes based on character and token limits.
- Adds truncation capabilities with configurable direction, markers, and token encoding models
- Implements priority-based box removal when content exceeds limits
- Extends both MarkdownWriter and FreeWriter to support char-limit, token-limit, and priority attributes
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| packages/poml/writer.ts | Core implementation of truncation logic, priority-based reduction, and tokenizer integration |
| packages/poml/tests/writer.test.tsx | Unit tests covering character limits, token limits, and priority-based truncation scenarios |
| packages/poml/tests/index.test.tsx | End-to-end tests validating truncation features through the full POML pipeline |
| packages/poml/base.tsx | Interface additions for charLimit, tokenLimit, and priority props |
Comments suppressed due to low confidence (2)
packages/poml/tests/index.test.tsx:25
- The attribute name 'charLimit' in the test uses camelCase, but the implementation expects 'char-limit' (kebab-case). This inconsistency could cause confusion.
const text = '<p charLimit="4">abcdefg</p>';
packages/poml/tests/index.test.tsx:31
- Similar to charLimit, 'tokenLimit' uses camelCase while the implementation expects 'token-limit' (kebab-case).
const text = '<p tokenLimit="1">hello world</p>';
| if (tokenLimit !== undefined) { | ||
| let enc = this.tokenizerCache[tokenEncodingModel]; | ||
| if (!enc) { | ||
| enc = encodingForModel(tokenEncodingModel as any); |
There was a problem hiding this comment.
The 'as any' type assertion bypasses TypeScript's type checking for the tokenEncodingModel parameter. Consider validating the model name against a known list of supported models or using proper typing.
| enc = encodingForModel(tokenEncodingModel as any); | |
| if (!Object.values(SupportedModels).includes(tokenEncodingModel)) { | |
| throw new Error(`Unsupported token encoding model: ${tokenEncodingModel}`); | |
| } | |
| enc = encodingForModel(tokenEncodingModel); |
| const tokenModel = (this.options as any).tokenEncodingModel || 'gpt-3.5-turbo'; | ||
| const getTokenLength = (t: string) => { | ||
| if (tokenLimit === undefined) { | ||
| return 0; | ||
| } | ||
| let enc = this.tokenizerCache[tokenModel]; | ||
| if (!enc) { | ||
| enc = encodingForModel(tokenModel as any); |
There was a problem hiding this comment.
Similar to line 117, this 'as any' type assertion bypasses type safety. The same validation approach should be applied here.
| const tokenModel = (this.options as any).tokenEncodingModel || 'gpt-3.5-turbo'; | |
| const getTokenLength = (t: string) => { | |
| if (tokenLimit === undefined) { | |
| return 0; | |
| } | |
| let enc = this.tokenizerCache[tokenModel]; | |
| if (!enc) { | |
| enc = encodingForModel(tokenModel as any); | |
| const validModels = ['gpt-3.5-turbo', 'gpt-4', 'gpt-3'] as const; | |
| type TokenModel = typeof validModels[number]; | |
| const tokenModel = validModels.includes((this.options?.tokenEncodingModel as string) ?? '') | |
| ? (this.options.tokenEncodingModel as TokenModel) | |
| : 'gpt-3.5-turbo'; | |
| const getTokenLength = (t: string) => { | |
| if (tokenLimit === undefined) { | |
| return 0; | |
| } | |
| let enc = this.tokenizerCache[tokenModel]; | |
| if (!enc) { | |
| enc = encodingForModel(tokenModel); |
| const minP = Math.min(...priorities); | ||
| if (current.every(b => (b.priority ?? 0) === minP)) { |
There was a problem hiding this comment.
This condition will break the loop when all remaining boxes have the same priority, but this means content may still exceed limits. Consider adding a comment explaining this behavior or implementing a fallback strategy.
| const minP = Math.min(...priorities); | |
| if (current.every(b => (b.priority ?? 0) === minP)) { | |
| const minP = Math.min(...priorities); | |
| // If all remaining boxes have the same priority, break the loop. | |
| // However, this may result in content exceeding the limits. | |
| if (current.every(b => (b.priority ?? 0) === minP)) { | |
| // Fallback strategy: Remove boxes from the end until limits are met. | |
| while ( | |
| (charLimit !== undefined && totalChars(current) > charLimit) || | |
| (tokenLimit !== undefined && totalTokens(current) > tokenLimit) | |
| ) { | |
| current.pop(); | |
| } |
| if (truncateDirection === 'start') { | ||
| truncated = truncated.slice(truncated.length - charLimit); | ||
| } else if (truncateDirection === 'middle') { | ||
| const head = Math.ceil(charLimit / 2); | ||
| const tail = charLimit - head; | ||
| truncated = truncated.slice(0, head) + truncated.slice(truncated.length - tail); | ||
| } else { | ||
| truncated = truncated.slice(0, charLimit); | ||
| } |
There was a problem hiding this comment.
The character limit truncation logic is duplicated with slight variations in the token limit section (lines 123-131). Consider extracting this logic into a helper function to reduce code duplication.
| if (truncateDirection === 'start') { | |
| truncated = truncated.slice(truncated.length - charLimit); | |
| } else if (truncateDirection === 'middle') { | |
| const head = Math.ceil(charLimit / 2); | |
| const tail = charLimit - head; | |
| truncated = truncated.slice(0, head) + truncated.slice(truncated.length - tail); | |
| } else { | |
| truncated = truncated.slice(0, charLimit); | |
| } | |
| truncated = this.truncateByDirection( | |
| truncated, | |
| charLimit, | |
| truncateDirection, | |
| (str, start, end) => str.slice(start, end) | |
| ); |
Summary
Testing
npm run build-webviewnpm run build-clinpm run lintnpm testpython -m pytest python/testshttps://chatgpt.com/codex/tasks/task_e_68885f74f82c832eabbfa758276b2fa5