Skip to content

Commit e8ae024

Browse files
feat!: remove optional length marker option [#N] in favor of [N]
1 parent 1c74f09 commit e8ae024

File tree

16 files changed

+38
-109
lines changed

16 files changed

+38
-109
lines changed

README.md

Lines changed: 9 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
[![CI](https://github.com/toon-format/toon/actions/workflows/ci.yml/badge.svg)](https://github.com/toon-format/toon/actions)
66
[![npm version](https://img.shields.io/npm/v/@toon-format/toon.svg)](https://www.npmjs.com/package/@toon-format/toon)
7-
[![SPEC v1.5](https://img.shields.io/badge/spec-v1.5-lightgray)](https://github.com/toon-format/spec)
7+
[![SPEC v2.0](https://img.shields.io/badge/spec-v2.0-lightgray)](https://github.com/toon-format/spec)
88
[![npm downloads (total)](https://img.shields.io/npm/dt/@toon-format/toon.svg)](https://www.npmjs.com/package/@toon-format/toon)
99
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)
1010

@@ -67,7 +67,7 @@ TOON excels with uniform arrays of objects, but there are cases where other form
6767

6868
- **Deeply nested or non-uniform structures** (tabular eligibility ≈ 0%): JSON-compact often uses fewer tokens. Example: complex configuration objects with many nested levels.
6969
- **Semi-uniform arrays** (~40–60% tabular eligibility): Token savings diminish. Prefer JSON if your pipelines already rely on it.
70-
- **Flat CSV use-cases**: CSV is smaller than TOON for pure tabular data. TOON adds minimal overhead (~5-10%) to provide structure (length markers, field headers, delimiter scoping) that improves LLM reliability.
70+
- **Flat CSV use-cases**: CSV is smaller than TOON for pure tabular data. TOON adds minimal overhead (~5-10%) to provide structure (array length declarations, field headers, delimiter scoping) that improves LLM reliability.
7171

7272
See [benchmarks](#benchmarks) for concrete comparisons across different data structures.
7373

@@ -80,7 +80,7 @@ See [benchmarks](#benchmarks) for concrete comparisons across different data str
8080
- 🍱 **Minimal syntax:** removes redundant punctuation (braces, brackets, most quotes)
8181
- 📐 **Indentation-based structure:** like YAML, uses whitespace instead of braces
8282
- 🧺 **Tabular arrays:** declare keys once, stream data as rows
83-
- 🔗 **Optional key folding (spec v1.5):** collapses single-key wrapper chains into dotted paths (e.g., `data.metadata.items`) to reduce indentation and tokens
83+
- 🔗 **Optional key folding:** collapses single-key wrapper chains into dotted paths (e.g., `data.metadata.items`) to reduce indentation and tokens
8484

8585
[^1]: For flat tabular data, CSV is more compact. TOON adds minimal overhead to provide explicit structure and validation that improves LLM reliability.
8686

@@ -734,7 +734,6 @@ cat data.toon | npx @toon-format/cli --decode
734734
| `-d, --decode` | Force decode mode (overrides auto-detection) |
735735
| `--delimiter <char>` | Array delimiter: `,` (comma), `\t` (tab), `\|` (pipe) |
736736
| `--indent <number>` | Indentation size (default: `2`) |
737-
| `--length-marker` | Add `#` prefix to array lengths (e.g., `items[#3]`) |
738737
| `--stats` | Show token count estimates and savings (encode only) |
739738
| `--no-strict` | Disable strict validation when decoding |
740739
| `--key-folding <mode>` | Key folding mode: `off`, `safe` (default: `off`) - collapses nested chains |
@@ -750,13 +749,13 @@ npx @toon-format/cli data.json --stats -o output.toon
750749
# Tab-separated output (often more token-efficient)
751750
npx @toon-format/cli data.json --delimiter "\t" -o output.toon
752751

753-
# Pipe-separated with length markers
754-
npx @toon-format/cli data.json --delimiter "|" --length-marker -o output.toon
752+
# Pipe-separated output
753+
npx @toon-format/cli data.json --delimiter "|" -o output.toon
755754

756755
# Lenient decoding (skip validation)
757756
npx @toon-format/cli data.toon --no-strict -o output.json
758757

759-
# Key folding for nested data (spec v1.5)
758+
# Key folding for nested data
760759
npx @toon-format/cli data.json --key-folding safe -o output.toon
761760

762761
# Stdin workflows
@@ -1015,7 +1014,6 @@ Converts any JSON-serializable value to TOON format.
10151014
- `options` – Optional encoding options:
10161015
- `indent?: number` – Number of spaces per indentation level (default: `2`)
10171016
- `delimiter?: ',' | '\t' | '|'` – Delimiter for array values and tabular rows (default: `','`)
1018-
- `lengthMarker?: '#' | false` – Optional marker to prefix array lengths (default: `false`)
10191017
- `keyFolding?: 'off' | 'safe'` – Enable key folding to collapse single-key wrapper chains into dotted paths (default: `'off'`). When `'safe'`, only valid identifier segments are folded
10201018
- `flattenDepth?: number` – Maximum number of segments to fold when `keyFolding` is enabled (default: `Infinity`). Values 0-1 have no practical effect
10211019

@@ -1098,37 +1096,6 @@ items[2|]{sku|name|qty|price}:
10981096
B2|Gadget|1|14.5
10991097
```
11001098

1101-
#### Length Marker Option
1102-
1103-
The `lengthMarker` option adds an optional hash (`#`) prefix to array lengths to emphasize that the bracketed value represents a count, not an index:
1104-
1105-
```ts
1106-
const data = {
1107-
tags: ['reading', 'gaming', 'coding'],
1108-
items: [
1109-
{ sku: 'A1', qty: 2, price: 9.99 },
1110-
{ sku: 'B2', qty: 1, price: 14.5 },
1111-
],
1112-
}
1113-
1114-
console.log(
1115-
encode(data, { lengthMarker: '#' })
1116-
)
1117-
// tags[#3]: reading,gaming,coding
1118-
// items[#2]{sku,qty,price}:
1119-
// A1,2,9.99
1120-
// B2,1,14.5
1121-
1122-
// Custom delimiter with length marker
1123-
console.log(
1124-
encode(data, { lengthMarker: '#', delimiter: '|' })
1125-
)
1126-
// tags[#3|]: reading|gaming|coding
1127-
// items[#2|]{sku|qty|price}:
1128-
// A1|2|9.99
1129-
// B2|1|14.5
1130-
```
1131-
11321099
### `decode(input: string, options?: DecodeOptions): JsonValue`
11331100

11341101
Converts a TOON-formatted string back to JavaScript values.
@@ -1179,7 +1146,7 @@ By default, the decoder validates input strictly:
11791146
- Format familiarity and structure matter as much as token count. TOON's tabular format requires arrays of objects with identical keys and primitive values only. When this doesn't hold (due to mixed types, non-uniform objects, or nested structures), TOON switches to list format where JSON can be more efficient at scale.
11801147
- **TOON excels at:** Uniform arrays of objects (same fields, primitive values), especially large datasets with consistent structure.
11811148
- **JSON is better for:** Non-uniform data, deeply nested structures, and objects with varying field sets.
1182-
- **CSV is more compact for:** Flat, uniform tables without nesting. TOON adds structure (`[N]` length markers, delimiter scoping, deterministic quoting) that improves LLM reliability with minimal token overhead.
1149+
- **CSV is more compact for:** Flat, uniform tables without nesting. TOON adds structure (`[N]` array lengths, delimiter scoping, deterministic quoting) that improves LLM reliability with minimal token overhead.
11831150
- **Token counts vary by tokenizer and model.** Benchmarks use a GPT-style tokenizer (cl100k/o200k); actual savings will differ with other models (e.g., [SentencePiece](https://github.com/google/sentencepiece)).
11841151
- **TOON is designed for LLM input** where human readability and token efficiency matter. It's **not** a drop-in replacement for JSON in APIs or storage.
11851152

@@ -1189,7 +1156,7 @@ TOON works best when you show the format instead of describing it. The structure
11891156

11901157
### Sending TOON to LLMs (Input)
11911158

1192-
Wrap your encoded data in a fenced code block (label it \`\`\`toon for clarity). The indentation and headers are usually enough – models treat it like familiar YAML or CSV. The explicit length markers (`[N]`) and field headers (`{field1,field2}`) help the model track structure, especially for large tables.
1159+
Wrap your encoded data in a fenced code block (label it \`\`\`toon for clarity). The indentation and headers are usually enough – models treat it like familiar YAML or CSV. The explicit array lengths (`[N]`) and field headers (`{field1,field2}`) help the model track structure, especially for large tables.
11931160

11941161
### Generating TOON from LLMs (Output)
11951162

@@ -1267,7 +1234,7 @@ Task: Return only users with role "user" as TOON. Use the same header. Set [N] t
12671234
## Other Implementations
12681235

12691236
> [!NOTE]
1270-
> When implementing TOON in other languages, please follow the [specification](https://github.com/toon-format/spec/blob/main/SPEC.md) (currently v1.5) to ensure compatibility across implementations. The [conformance tests](https://github.com/toon-format/spec/tree/main/tests) provide language-agnostic test fixtures that validate your implementations.
1237+
> When implementing TOON in other languages, please follow the [specification](https://github.com/toon-format/spec/blob/main/SPEC.md) (currently v2.0) to ensure compatibility across implementations. The [conformance tests](https://github.com/toon-format/spec/tree/main/tests) provide language-agnostic test fixtures that validate your implementations.
12711238
12721239
### Official Implementations
12731240

SPEC.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ The TOON specification has moved to a dedicated repository: [github.com/toon-for
44

55
## Current Version
66

7-
**Version 1.4** (2025-11-05)
7+
**Version 2.0** (2025-11-10)
88

99
## Quick Links
1010

packages/cli/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,6 @@ cat data.toon | toon --decode
6262
| `-d, --decode` | Force decode mode (overrides auto-detection) |
6363
| `--delimiter <char>` | Array delimiter: `,` (comma), `\t` (tab), `\|` (pipe) |
6464
| `--indent <number>` | Indentation size (default: `2`) |
65-
| `--length-marker` | Add `#` prefix to array lengths (e.g., `items[#3]`) |
6665
| `--stats` | Show token count estimates and savings (encode only) |
6766
| `--no-strict` | Disable strict validation when decoding |
6867
| `--key-folding <mode>` | Enable key folding: `off`, `safe` (default: `off`) |
@@ -122,7 +121,7 @@ cat large-dataset.json | toon --delimiter "\t" > output.toon
122121
jq '.results' data.json | toon > filtered.toon
123122
```
124123

125-
### Key Folding (spec v1.5)
124+
### Key Folding (Since v1.5)
126125

127126
Collapse nested wrapper chains to reduce tokens:
128127

packages/cli/src/conversion.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ export async function encodeToToon(config: {
1313
output?: string
1414
indent: NonNullable<EncodeOptions['indent']>
1515
delimiter: NonNullable<EncodeOptions['delimiter']>
16-
lengthMarker: NonNullable<EncodeOptions['lengthMarker']>
1716
keyFolding?: NonNullable<EncodeOptions['keyFolding']>
1817
flattenDepth?: number
1918
printStats: boolean
@@ -31,7 +30,6 @@ export async function encodeToToon(config: {
3130
const encodeOptions: EncodeOptions = {
3231
delimiter: config.delimiter,
3332
indent: config.indent,
34-
lengthMarker: config.lengthMarker,
3533
keyFolding: config.keyFolding,
3634
flattenDepth: config.flattenDepth,
3735
}

packages/cli/src/index.ts

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,6 @@ export const mainCommand: CommandDef<{
4141
description: string
4242
default: string
4343
}
44-
lengthMarker: {
45-
type: 'boolean'
46-
description: string
47-
default: false
48-
}
4944
strict: {
5045
type: 'boolean'
5146
description: string
@@ -107,11 +102,6 @@ export const mainCommand: CommandDef<{
107102
description: 'Indentation size',
108103
default: '2',
109104
},
110-
lengthMarker: {
111-
type: 'boolean',
112-
description: 'Use length marker (#) for arrays',
113-
default: false,
114-
},
115105
strict: {
116106
type: 'boolean',
117107
description: 'Enable strict mode for decoding',
@@ -187,10 +177,9 @@ export const mainCommand: CommandDef<{
187177
output: outputPath,
188178
delimiter: delimiter as Delimiter,
189179
indent,
190-
lengthMarker: args.lengthMarker === true ? '#' : false,
191-
printStats: args.stats === true,
192180
keyFolding: keyFolding as NonNullable<EncodeOptions['keyFolding']>,
193181
flattenDepth,
182+
printStats: args.stats === true,
194183
})
195184
}
196185
else {

packages/cli/test/index.test.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,6 @@ describe('toon CLI', () => {
4444
const expected = encode(data, {
4545
delimiter: DEFAULT_DELIMITER,
4646
indent: 2,
47-
lengthMarker: false,
4847
})
4948

5049
expect(output).toBe(expected)

packages/toon/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,6 @@
3838
"test": "vitest"
3939
},
4040
"devDependencies": {
41-
"@toon-format/spec": "^1.5.2"
41+
"@toon-format/spec": "^2.0.0"
4242
}
4343
}

packages/toon/src/constants.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ export const COMMA = ','
1111
export const COLON = ':'
1212
export const SPACE = ' '
1313
export const PIPE = '|'
14-
export const HASH = '#'
1514
export const DOT = '.'
1615

1716
// #endregion

packages/toon/src/decode/parser.ts

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import type { ArrayHeaderInfo, Delimiter, JsonPrimitive } from '../types'
2-
import { BACKSLASH, CLOSE_BRACE, CLOSE_BRACKET, COLON, DELIMITERS, DOUBLE_QUOTE, FALSE_LITERAL, HASH, NULL_LITERAL, OPEN_BRACE, OPEN_BRACKET, PIPE, TAB, TRUE_LITERAL } from '../constants'
2+
import { BACKSLASH, CLOSE_BRACE, CLOSE_BRACKET, COLON, DELIMITERS, DOUBLE_QUOTE, FALSE_LITERAL, NULL_LITERAL, OPEN_BRACE, OPEN_BRACKET, PIPE, TAB, TRUE_LITERAL } from '../constants'
33
import { isBooleanOrNullLiteral, isNumericLiteral } from '../shared/literal-utils'
44
import { findClosingQuote, findUnquotedChar, unescapeString } from '../shared/string-utils'
55

@@ -84,7 +84,7 @@ export function parseArrayHeaderLine(
8484
return
8585
}
8686

87-
const { length, delimiter, hasLengthMarker } = parsedBracket
87+
const { length, delimiter } = parsedBracket
8888

8989
// Check for fields segment
9090
let fields: string[] | undefined
@@ -102,7 +102,6 @@ export function parseArrayHeaderLine(
102102
length,
103103
delimiter,
104104
fields,
105-
hasLengthMarker,
106105
},
107106
inlineValues: afterColon || undefined,
108107
}
@@ -111,16 +110,9 @@ export function parseArrayHeaderLine(
111110
export function parseBracketSegment(
112111
seg: string,
113112
defaultDelimiter: Delimiter,
114-
): { length: number, delimiter: Delimiter, hasLengthMarker: boolean } {
115-
let hasLengthMarker = false
113+
): { length: number, delimiter: Delimiter } {
116114
let content = seg
117115

118-
// Check for length marker
119-
if (content.startsWith(HASH)) {
120-
hasLengthMarker = true
121-
content = content.slice(1)
122-
}
123-
124116
// Check for delimiter suffix
125117
let delimiter = defaultDelimiter
126118
if (content.endsWith(TAB)) {
@@ -137,7 +129,7 @@ export function parseBracketSegment(
137129
throw new TypeError(`Invalid array length: ${seg}`)
138130
}
139131

140-
return { length, delimiter, hasLengthMarker }
132+
return { length, delimiter }
141133
}
142134

143135
// #endregion

packages/toon/src/encode/encoders.ts

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -113,15 +113,15 @@ export function encodeArray(
113113
options: ResolvedEncodeOptions,
114114
): void {
115115
if (value.length === 0) {
116-
const header = formatHeader(0, { key, delimiter: options.delimiter, lengthMarker: options.lengthMarker })
116+
const header = formatHeader(0, { key, delimiter: options.delimiter })
117117
writer.push(depth, header)
118118
return
119119
}
120120

121121
// Primitive array
122122
if (isArrayOfPrimitives(value)) {
123-
const formatted = encodeInlineArrayLine(value, options.delimiter, key, options.lengthMarker)
124-
writer.push(depth, formatted)
123+
const arrayLine = encodeInlineArrayLine(value, options.delimiter, key)
124+
writer.push(depth, arrayLine)
125125
return
126126
}
127127

@@ -161,19 +161,19 @@ export function encodeArrayOfArraysAsListItems(
161161
depth: Depth,
162162
options: ResolvedEncodeOptions,
163163
): void {
164-
const header = formatHeader(values.length, { key: prefix, delimiter: options.delimiter, lengthMarker: options.lengthMarker })
164+
const header = formatHeader(values.length, { key: prefix, delimiter: options.delimiter })
165165
writer.push(depth, header)
166166

167167
for (const arr of values) {
168168
if (isArrayOfPrimitives(arr)) {
169-
const inline = encodeInlineArrayLine(arr, options.delimiter, undefined, options.lengthMarker)
170-
writer.pushListItem(depth + 1, inline)
169+
const arrayLine = encodeInlineArrayLine(arr, options.delimiter)
170+
writer.pushListItem(depth + 1, arrayLine)
171171
}
172172
}
173173
}
174174

175-
export function encodeInlineArrayLine(values: readonly JsonPrimitive[], delimiter: string, prefix?: string, lengthMarker?: '#' | false): string {
176-
const header = formatHeader(values.length, { key: prefix, delimiter, lengthMarker })
175+
export function encodeInlineArrayLine(values: readonly JsonPrimitive[], delimiter: string, prefix?: string): string {
176+
const header = formatHeader(values.length, { key: prefix, delimiter })
177177
const joinedValue = encodeAndJoinPrimitives(values, delimiter)
178178
// Only add space if there are values
179179
if (values.length === 0) {
@@ -194,7 +194,7 @@ export function encodeArrayOfObjectsAsTabular(
194194
depth: Depth,
195195
options: ResolvedEncodeOptions,
196196
): void {
197-
const formattedHeader = formatHeader(rows.length, { key: prefix, fields: header, delimiter: options.delimiter, lengthMarker: options.lengthMarker })
197+
const formattedHeader = formatHeader(rows.length, { key: prefix, fields: header, delimiter: options.delimiter })
198198
writer.push(depth, `${formattedHeader}`)
199199

200200
writeTabularRows(rows, header, writer, depth + 1, options)
@@ -265,7 +265,7 @@ export function encodeMixedArrayAsListItems(
265265
depth: Depth,
266266
options: ResolvedEncodeOptions,
267267
): void {
268-
const header = formatHeader(items.length, { key: prefix, delimiter: options.delimiter, lengthMarker: options.lengthMarker })
268+
const header = formatHeader(items.length, { key: prefix, delimiter: options.delimiter })
269269
writer.push(depth, header)
270270

271271
for (const item of items) {
@@ -289,15 +289,15 @@ export function encodeObjectAsListItem(obj: JsonObject, writer: LineWriter, dept
289289
else if (isJsonArray(firstValue)) {
290290
if (isArrayOfPrimitives(firstValue)) {
291291
// Inline format for primitive arrays
292-
const formatted = encodeInlineArrayLine(firstValue, options.delimiter, firstKey, options.lengthMarker)
293-
writer.pushListItem(depth, formatted)
292+
const arrayPropertyLine = encodeInlineArrayLine(firstValue, options.delimiter, firstKey)
293+
writer.pushListItem(depth, arrayPropertyLine)
294294
}
295295
else if (isArrayOfObjects(firstValue)) {
296296
// Check if array of objects can use tabular format
297297
const header = extractTabularHeader(firstValue)
298298
if (header) {
299299
// Tabular format for uniform arrays of objects
300-
const formattedHeader = formatHeader(firstValue.length, { key: firstKey, fields: header, delimiter: options.delimiter, lengthMarker: options.lengthMarker })
300+
const formattedHeader = formatHeader(firstValue.length, { key: firstKey, fields: header, delimiter: options.delimiter })
301301
writer.pushListItem(depth, formattedHeader)
302302
writeTabularRows(firstValue, header, writer, depth + 1, options)
303303
}
@@ -347,8 +347,8 @@ function encodeListItemValue(
347347
writer.pushListItem(depth, encodePrimitive(value, options.delimiter))
348348
}
349349
else if (isJsonArray(value) && isArrayOfPrimitives(value)) {
350-
const inline = encodeInlineArrayLine(value, options.delimiter, undefined, options.lengthMarker)
351-
writer.pushListItem(depth, inline)
350+
const arrayLine = encodeInlineArrayLine(value, options.delimiter)
351+
writer.pushListItem(depth, arrayLine)
352352
}
353353
else if (isJsonObject(value)) {
354354
encodeObjectAsListItem(value, writer, depth, options)

0 commit comments

Comments
 (0)