Skip to content

Commit 1c003c6

Browse files
feat(cli): memory-efficient streaming for encoding
1 parent be8bcfe commit 1c003c6

File tree

7 files changed

+308
-50
lines changed

7 files changed

+308
-50
lines changed

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -764,6 +764,19 @@ console.log(encode(data))
764764
// 2,Bob,user
765765
```
766766

767+
**Streaming large datasets:**
768+
769+
```ts
770+
import { encodeLines } from '@toon-format/toon'
771+
772+
const largeData = await fetchThousandsOfRecords()
773+
774+
// Memory-efficient streaming for large data
775+
for (const line of encodeLines(largeData)) {
776+
process.stdout.write(`${line}\n`)
777+
}
778+
```
779+
767780
## Playgrounds
768781

769782
Experiment with TOON format interactively using these community-built tools for token comparison, format conversion, and validation:

docs/cli/index.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Command Line Interface
22

3-
The `@toon-format/cli` package provides a command-line interface for encoding JSON to TOON and decoding TOON back to JSON. Use it for quick conversions without writing code, estimating token savings before sending data to LLMs, or integrating TOON into shell pipelines with tools like curl and jq. It supports stdin/stdout workflows, multiple delimiter options, token statistics, and all encoding/decoding features available in the library.
3+
The `@toon-format/cli` package provides a command-line interface for encoding JSON to TOON and decoding TOON back to JSON. Use it to analyze token savings before integrating TOON into your application, or to process JSON data through TOON in shell pipelines using stdin/stdout with tools like curl and jq. The CLI supports token statistics, streaming for large datasets, and all encoding options available in the library.
44

55
The CLI is built on top of the `@toon-format/toon` TypeScript implementation and adheres to the [latest specification](/reference/spec).
66

@@ -108,6 +108,14 @@ cat data.toon | toon --decode
108108

109109
JSON→TOON conversions use line-by-line encoding internally, which avoids holding the entire TOON document in memory. This makes the CLI efficient for large datasets without requiring additional configuration.
110110

111+
```bash
112+
# Encode large JSON file with minimal memory usage
113+
toon huge-dataset.json -o output.toon
114+
115+
# Process millions of records efficiently via stdin
116+
cat million-records.json | toon > output.toon
117+
```
118+
111119
::: info Token Statistics
112120
When using the `--stats` flag, the CLI builds the full TOON string once to compute accurate token counts. For maximum memory efficiency on very large files, omit `--stats`.
113121
:::
@@ -139,6 +147,15 @@ toon data.json --stats -o output.toon
139147

140148
This helps you estimate token cost savings before sending data to LLMs.
141149

150+
Example output:
151+
152+
```
153+
✔ Encoded data.json → output.toon
154+
155+
ℹ Token estimates: ~15,145 (JSON) → ~8,745 (TOON)
156+
✔ Saved ~6,400 tokens (-42.3%)
157+
```
158+
142159
### Alternative Delimiters
143160

144161
TOON supports three delimiters: comma (default), tab, and pipe. Alternative delimiters can provide additional token savings in specific contexts.

docs/guide/llm-prompts.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,29 @@ const toon = encode(data, { delimiter: '\t' })
9595

9696
Tell the model "fields are tab-separated" when using tabs. For more on delimiters, see the [Format Overview](/guide/format-overview#delimiter-options).
9797

98+
## Streaming Large Outputs
99+
100+
When working with large datasets (thousands of records or deeply nested structures), use `encodeLines()` to stream TOON output line-by-line instead of building the full string in memory.
101+
102+
```ts
103+
import { encodeLines } from '@toon-format/toon'
104+
105+
const largeData = await fetchThousandsOfRecords()
106+
107+
// Stream large dataset without loading full string in memory
108+
for (const line of encodeLines(largeData, { delimiter: '\t' })) {
109+
process.stdout.write(`${line}\n`)
110+
}
111+
```
112+
113+
The CLI also supports streaming for memory-efficient JSON-to-TOON conversion:
114+
115+
```bash
116+
toon large-dataset.json --output output.toon
117+
```
118+
119+
This streaming approach prevents out-of-memory errors when preparing large context windows for LLMs. For complete details on `encodeLines()`, see the [API reference](/reference/api#encodelines).
120+
98121
## Tips and Pitfalls
99122

100123
**Show, don't describe.** Don't explain TOON syntax in detail – just show an example. Models learn the pattern from context. A simple code block with 2-5 rows is more effective than paragraphs of explanation.

docs/reference/api.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -129,14 +129,14 @@ encode(data, { delimiter: '\t', keyFolding: 'safe' })
129129

130130
## `encodeLines(value, options?)`
131131

132-
Converts any JSON-serializable value to TOON format as a sequence of lines, without building the full string in memory. Suitable for streaming large outputs to files, HTTP responses, or process stdout.
132+
**Preferred method for streaming TOON output.** Converts any JSON-serializable value to TOON format as a sequence of lines, without building the full string in memory. Suitable for streaming large outputs to files, HTTP responses, or process stdout.
133133

134134
```ts
135135
import { encodeLines } from '@toon-format/toon'
136136
137-
// Stream to stdout
137+
// Stream to stdout (Node.js)
138138
for (const line of encodeLines(data)) {
139-
console.log(line)
139+
process.stdout.write(`${line}\n`)
140140
}
141141

142142
// Write to file line-by-line
@@ -158,7 +158,7 @@ const lineArray = Array.from(encodeLines(data))
158158

159159
### Return Value
160160

161-
Returns an `Iterable<string>` that yields TOON lines one at a time. Each yielded string is a single line without a trailing newline character.
161+
Returns an `Iterable<string>` that yields TOON lines one at a time. **Each yielded string is a single line without a trailing newline character** — you must add `\n` when writing to streams or stdout.
162162

163163
::: info Relationship to `encode()`
164164
`encode(value, options)` is equivalent to:

packages/cli/README.md

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# @toon-format/cli
22

3-
Command-line tool for converting between JSON and TOON formats.
3+
Command-line tool for converting JSON to TOON and back, with token analysis and streaming support.
44

5-
[TOON (Token-Oriented Object Notation)](https://toonformat.dev) is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.
5+
[TOON (Token-Oriented Object Notation)](https://toonformat.dev) is a compact, human-readable encoding of the JSON data model that minimizes tokens for LLM input. The CLI lets you test conversions, analyze token savings, and integrate TOON into shell pipelines with stdin/stdout support—no code required.
66

77
## Installation
88

@@ -79,11 +79,12 @@ toon data.json --stats -o output.toon
7979
```
8080

8181
Example output:
82+
8283
```
83-
Encoded to TOON
84-
Input: 15,145 tokens (JSON)
85-
Output: 8,745 tokens (TOON)
86-
Saved: 6,400 tokens (42.3% reduction)
84+
Encoded data.json → output.toon
85+
86+
ℹ Token estimates: ~15,145 (JSON) → ~8,745 (TOON)
87+
Saved ~6,400 tokens (-42.3%)
8788
```
8889

8990
### Alternative Delimiters
@@ -115,6 +116,21 @@ cat large-dataset.json | toon --delimiter "\t" > output.toon
115116
jq '.results' data.json | toon > filtered.toon
116117
```
117118

119+
### Large Dataset Processing
120+
121+
The CLI streams output line-by-line without building the full string in memory, making it suitable for processing large datasets:
122+
123+
```bash
124+
# Encode large JSON file with minimal memory usage
125+
toon huge-dataset.json -o output.toon
126+
127+
# Process millions of records efficiently
128+
cat million-records.json | toon > output.toon
129+
```
130+
131+
> [!NOTE]
132+
> When using `--stats`, the full output string is kept in memory for token counting. Omit `--stats` for maximum memory efficiency with very large datasets.
133+
118134
### Key Folding (Since v1.5)
119135

120136
Collapse nested wrapper chains to reduce tokens:
@@ -190,6 +206,7 @@ toon data.json --key-folding safe --delimiter "\t" --stats -o output.toon
190206
- **Pipeline integration** with existing JSON-based workflows
191207
- **Flexible formatting** with delimiter and indentation options
192208
- **Key folding** to collapse nested wrappers for additional token savings
209+
- **Memory-efficient streaming** for processing large datasets without loading everything into memory
193210

194211
## Related
195212

packages/cli/src/conversion.ts

Lines changed: 71 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import type { FileHandle } from 'node:fs/promises'
12
import type { DecodeOptions, EncodeOptions } from '../../toon/src'
23
import type { InputSource } from './types'
34
import * as fsp from 'node:fs/promises'
@@ -34,38 +35,42 @@ export async function encodeToToon(config: {
3435
flattenDepth: config.flattenDepth,
3536
}
3637

37-
let toonOutput: string
38-
3938
// When printing stats, we need the full string for token counting
4039
if (config.printStats) {
41-
toonOutput = encode(data, encodeOptions)
42-
}
43-
else {
44-
// Use streaming encoder for non-stats path
45-
const lines = Array.from(encodeLines(data, encodeOptions))
46-
toonOutput = lines.join('\n')
47-
}
40+
const toonOutput = encode(data, encodeOptions)
4841

49-
if (config.output) {
50-
await fsp.writeFile(config.output, toonOutput, 'utf-8')
51-
const relativeInputPath = formatInputLabel(config.input)
52-
const relativeOutputPath = path.relative(process.cwd(), config.output)
53-
consola.success(`Encoded \`${relativeInputPath}\` → \`${relativeOutputPath}\``)
54-
}
55-
else {
56-
console.log(toonOutput)
57-
}
42+
if (config.output) {
43+
await fsp.writeFile(config.output, toonOutput, 'utf-8')
44+
}
45+
else {
46+
console.log(toonOutput)
47+
}
5848

59-
if (config.printStats) {
6049
const jsonTokens = estimateTokenCount(jsonContent)
6150
const toonTokens = estimateTokenCount(toonOutput)
6251
const diff = jsonTokens - toonTokens
6352
const percent = ((diff / jsonTokens) * 100).toFixed(1)
6453

54+
if (config.output) {
55+
const relativeInputPath = formatInputLabel(config.input)
56+
const relativeOutputPath = path.relative(process.cwd(), config.output)
57+
consola.success(`Encoded \`${relativeInputPath}\` → \`${relativeOutputPath}\``)
58+
}
59+
6560
console.log()
6661
consola.info(`Token estimates: ~${jsonTokens} (JSON) → ~${toonTokens} (TOON)`)
6762
consola.success(`Saved ~${diff} tokens (-${percent}%)`)
6863
}
64+
else {
65+
// Use streaming encoder for memory-efficient output
66+
await writeStreamingToon(encodeLines(data, encodeOptions), config.output)
67+
68+
if (config.output) {
69+
const relativeInputPath = formatInputLabel(config.input)
70+
const relativeOutputPath = path.relative(process.cwd(), config.output)
71+
consola.success(`Encoded \`${relativeInputPath}\` → \`${relativeOutputPath}\``)
72+
}
73+
}
6974
}
7075

7176
export async function decodeToJson(config: {
@@ -102,3 +107,50 @@ export async function decodeToJson(config: {
102107
console.log(jsonOutput)
103108
}
104109
}
110+
111+
/**
112+
* Writes TOON lines to a file or stdout using streaming approach.
113+
* Lines are written one at a time without building the full string in memory.
114+
*
115+
* @param lines - Iterable of TOON lines (without trailing newlines)
116+
* @param outputPath - File path to write to, or undefined for stdout
117+
*/
118+
async function writeStreamingToon(
119+
lines: Iterable<string>,
120+
outputPath?: string,
121+
): Promise<void> {
122+
let isFirst = true
123+
124+
// Stream to file using fs/promises API
125+
if (outputPath) {
126+
let fileHandle: FileHandle | undefined
127+
128+
try {
129+
fileHandle = await fsp.open(outputPath, 'w')
130+
131+
for (const line of lines) {
132+
if (!isFirst)
133+
await fileHandle.write('\n')
134+
135+
await fileHandle.write(line)
136+
isFirst = false
137+
}
138+
}
139+
finally {
140+
await fileHandle?.close()
141+
}
142+
}
143+
// Stream to stdout
144+
else {
145+
for (const line of lines) {
146+
if (!isFirst)
147+
process.stdout.write('\n')
148+
149+
process.stdout.write(line)
150+
isFirst = false
151+
}
152+
153+
// Add final newline for stdout
154+
process.stdout.write('\n')
155+
}
156+
}

0 commit comments

Comments
 (0)