Skip to content

Add priority-based truncation#63

Merged
ultmaster merged 1 commit intomainfrom
codex/add-charlimit-and-tokenlimit-to-propsbase
Jul 29, 2025
Merged

Add priority-based truncation#63
ultmaster merged 1 commit intomainfrom
codex/add-charlimit-and-tokenlimit-to-propsbase

Conversation

@ultmaster
Copy link
Contributor

Summary

  • support priority-driven trimming when concatenating markdown boxes
  • propagate truncation options from elements and reduce boxes by char/token limits
  • update writer tests to cover token limits and priority-based trimming
  • extend end-to-end tests for token limit cases

Testing

  • npm run build-webview
  • npm run build-cli
  • npm run lint
  • npm test
  • python -m pytest python/tests

https://chatgpt.com/codex/tasks/task_e_68885f74f82c832eabbfa758276b2fa5

Copilot AI review requested due to automatic review settings July 29, 2025 08:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds priority-based truncation support to the POML writer system, enabling intelligent content reduction when concatenating markdown boxes based on character and token limits.

  • Adds truncation capabilities with configurable direction, markers, and token encoding models
  • Implements priority-based box removal when content exceeds limits
  • Extends both MarkdownWriter and FreeWriter to support char-limit, token-limit, and priority attributes

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
packages/poml/writer.ts Core implementation of truncation logic, priority-based reduction, and tokenizer integration
packages/poml/tests/writer.test.tsx Unit tests covering character limits, token limits, and priority-based truncation scenarios
packages/poml/tests/index.test.tsx End-to-end tests validating truncation features through the full POML pipeline
packages/poml/base.tsx Interface additions for charLimit, tokenLimit, and priority props
Comments suppressed due to low confidence (2)

packages/poml/tests/index.test.tsx:25

  • The attribute name 'charLimit' in the test uses camelCase, but the implementation expects 'char-limit' (kebab-case). This inconsistency could cause confusion.
    const text = '<p charLimit="4">abcdefg</p>';

packages/poml/tests/index.test.tsx:31

  • Similar to charLimit, 'tokenLimit' uses camelCase while the implementation expects 'token-limit' (kebab-case).
    const text = '<p tokenLimit="1">hello world</p>';

if (tokenLimit !== undefined) {
let enc = this.tokenizerCache[tokenEncodingModel];
if (!enc) {
enc = encodingForModel(tokenEncodingModel as any);
Copy link

Copilot AI Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'as any' type assertion bypasses TypeScript's type checking for the tokenEncodingModel parameter. Consider validating the model name against a known list of supported models or using proper typing.

Suggested change
enc = encodingForModel(tokenEncodingModel as any);
if (!Object.values(SupportedModels).includes(tokenEncodingModel)) {
throw new Error(`Unsupported token encoding model: ${tokenEncodingModel}`);
}
enc = encodingForModel(tokenEncodingModel);

Copilot uses AI. Check for mistakes.
Comment on lines +769 to +776
const tokenModel = (this.options as any).tokenEncodingModel || 'gpt-3.5-turbo';
const getTokenLength = (t: string) => {
if (tokenLimit === undefined) {
return 0;
}
let enc = this.tokenizerCache[tokenModel];
if (!enc) {
enc = encodingForModel(tokenModel as any);
Copy link

Copilot AI Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to line 117, this 'as any' type assertion bypasses type safety. The same validation approach should be applied here.

Suggested change
const tokenModel = (this.options as any).tokenEncodingModel || 'gpt-3.5-turbo';
const getTokenLength = (t: string) => {
if (tokenLimit === undefined) {
return 0;
}
let enc = this.tokenizerCache[tokenModel];
if (!enc) {
enc = encodingForModel(tokenModel as any);
const validModels = ['gpt-3.5-turbo', 'gpt-4', 'gpt-3'] as const;
type TokenModel = typeof validModels[number];
const tokenModel = validModels.includes((this.options?.tokenEncodingModel as string) ?? '')
? (this.options.tokenEncodingModel as TokenModel)
: 'gpt-3.5-turbo';
const getTokenLength = (t: string) => {
if (tokenLimit === undefined) {
return 0;
}
let enc = this.tokenizerCache[tokenModel];
if (!enc) {
enc = encodingForModel(tokenModel);

Copilot uses AI. Check for mistakes.
Comment on lines +793 to +794
const minP = Math.min(...priorities);
if (current.every(b => (b.priority ?? 0) === minP)) {
Copy link

Copilot AI Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will break the loop when all remaining boxes have the same priority, but this means content may still exceed limits. Consider adding a comment explaining this behavior or implementing a fallback strategy.

Suggested change
const minP = Math.min(...priorities);
if (current.every(b => (b.priority ?? 0) === minP)) {
const minP = Math.min(...priorities);
// If all remaining boxes have the same priority, break the loop.
// However, this may result in content exceeding the limits.
if (current.every(b => (b.priority ?? 0) === minP)) {
// Fallback strategy: Remove boxes from the end until limits are met.
while (
(charLimit !== undefined && totalChars(current) > charLimit) ||
(tokenLimit !== undefined && totalTokens(current) > tokenLimit)
) {
current.pop();
}

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +111
if (truncateDirection === 'start') {
truncated = truncated.slice(truncated.length - charLimit);
} else if (truncateDirection === 'middle') {
const head = Math.ceil(charLimit / 2);
const tail = charLimit - head;
truncated = truncated.slice(0, head) + truncated.slice(truncated.length - tail);
} else {
truncated = truncated.slice(0, charLimit);
}
Copy link

Copilot AI Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The character limit truncation logic is duplicated with slight variations in the token limit section (lines 123-131). Consider extracting this logic into a helper function to reduce code duplication.

Suggested change
if (truncateDirection === 'start') {
truncated = truncated.slice(truncated.length - charLimit);
} else if (truncateDirection === 'middle') {
const head = Math.ceil(charLimit / 2);
const tail = charLimit - head;
truncated = truncated.slice(0, head) + truncated.slice(truncated.length - tail);
} else {
truncated = truncated.slice(0, charLimit);
}
truncated = this.truncateByDirection(
truncated,
charLimit,
truncateDirection,
(str, start, end) => str.slice(start, end)
);

Copilot uses AI. Check for mistakes.
@ultmaster ultmaster merged commit a91edf2 into main Jul 29, 2025
3 checks passed
@ultmaster ultmaster deleted the codex/add-charlimit-and-tokenlimit-to-propsbase branch August 27, 2025 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants