Skip to content

Commit 9ebad53

Browse files
feat(cli): stream output for both encoding and decoding
1 parent cfbbb09 commit 9ebad53

File tree

6 files changed

+486
-24
lines changed

6 files changed

+486
-24
lines changed

docs/cli/index.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -104,20 +104,34 @@ cat data.toon | toon --decode
104104

105105
## Performance
106106

107-
### Streaming Encoding
107+
### Streaming Output
108108

109-
JSON→TOON conversions use line-by-line encoding internally, which avoids holding the entire TOON document in memory. This makes the CLI efficient for large datasets without requiring additional configuration.
109+
Both encoding and decoding operations use streaming output, writing incrementally without building the full output string in memory. This makes the CLI efficient for large datasets without requiring additional configuration.
110+
111+
**JSON → TOON (Encode)**
112+
- Streams TOON lines to output
113+
- No full TOON string in memory
114+
115+
**TOON → JSON (Decode)**
116+
- Streams JSON tokens to output
117+
- No full JSON string in memory
110118

111119
```bash
112120
# Encode large JSON file with minimal memory usage
113121
toon huge-dataset.json -o output.toon
114122

123+
# Decode large TOON file with minimal memory usage
124+
toon huge-dataset.toon -o output.json
125+
115126
# Process millions of records efficiently via stdin
116127
cat million-records.json | toon > output.toon
128+
cat million-records.toon | toon --decode > output.json
117129
```
118130

131+
Peak memory usage scales with data depth, not total size. This allows processing arbitrarily large files as long as individual nested structures fit in memory.
132+
119133
::: info Token Statistics
120-
When using the `--stats` flag, the CLI builds the full TOON string once to compute accurate token counts. For maximum memory efficiency on very large files, omit `--stats`.
134+
When using the `--stats` flag with encode, the CLI builds the full TOON string once to compute accurate token counts. For maximum memory efficiency on very large files, omit `--stats`.
121135
:::
122136

123137
## Options

packages/cli/README.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -118,18 +118,27 @@ jq '.results' data.json | toon > filtered.toon
118118

119119
### Large Dataset Processing
120120

121-
The CLI streams output line-by-line without building the full string in memory, making it suitable for processing large datasets:
121+
The CLI uses streaming output for both encoding and decoding, writing incrementally without building the full output string in memory:
122122

123123
```bash
124124
# Encode large JSON file with minimal memory usage
125125
toon huge-dataset.json -o output.toon
126126

127-
# Process millions of records efficiently
127+
# Decode large TOON file with streaming JSON output
128+
toon huge-dataset.toon -o output.json
129+
130+
# Process millions of records efficiently via stdin
128131
cat million-records.json | toon > output.toon
132+
cat million-records.toon | toon --decode > output.json
129133
```
130134

135+
**Memory efficiency:**
136+
- **Encode (JSON → TOON)**: Streams TOON lines to output without full string in memory
137+
- **Decode (TOON → JSON)**: Streams JSON tokens to output without full string in memory
138+
- Peak memory usage scales with data depth, not total size
139+
131140
> [!NOTE]
132-
> When using `--stats`, the full output string is kept in memory for token counting. Omit `--stats` for maximum memory efficiency with very large datasets.
141+
> When using `--stats` with encode, the full output string is kept in memory for token counting. Omit `--stats` for maximum memory efficiency with very large datasets.
133142
134143
### Key Folding (Since v1.5)
135144

@@ -206,7 +215,7 @@ toon data.json --key-folding safe --delimiter "\t" --stats -o output.toon
206215
- **Pipeline integration** with existing JSON-based workflows
207216
- **Flexible formatting** with delimiter and indentation options
208217
- **Key folding** to collapse nested wrappers for additional token savings
209-
- **Memory-efficient streaming** for processing large datasets without loading everything into memory
218+
- **Memory-efficient streaming** for both encode and decode operations - process large datasets without loading entire outputs into memory
210219

211220
## Related
212221

packages/cli/src/conversion.ts

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ import process from 'node:process'
77
import { consola } from 'consola'
88
import { estimateTokenCount } from 'tokenx'
99
import { decode, encode, encodeLines } from '../../toon/src'
10+
import { jsonStringifyLines } from './json-stringify-stream'
1011
import { formatInputLabel, readInput } from './utils'
1112

1213
export async function encodeToToon(config: {
@@ -62,7 +63,6 @@ export async function encodeToToon(config: {
6263
consola.success(`Saved ~${diff} tokens (-${percent}%)`)
6364
}
6465
else {
65-
// Use streaming encoder for memory-efficient output
6666
await writeStreamingToon(encodeLines(data, encodeOptions), config.output)
6767

6868
if (config.output) {
@@ -95,25 +95,52 @@ export async function decodeToJson(config: {
9595
throw new Error(`Failed to decode TOON: ${error instanceof Error ? error.message : String(error)}`)
9696
}
9797

98-
const jsonOutput = JSON.stringify(data, undefined, config.indent)
98+
await writeStreamingJson(jsonStringifyLines(data, config.indent), config.output)
9999

100100
if (config.output) {
101-
await fsp.writeFile(config.output, jsonOutput, 'utf-8')
102101
const relativeInputPath = formatInputLabel(config.input)
103102
const relativeOutputPath = path.relative(process.cwd(), config.output)
104103
consola.success(`Decoded \`${relativeInputPath}\` → \`${relativeOutputPath}\``)
105104
}
105+
}
106+
107+
/**
108+
* Writes JSON chunks to a file or stdout using streaming approach.
109+
* Chunks are written one at a time without building the full string in memory.
110+
*/
111+
async function writeStreamingJson(
112+
chunks: Iterable<string>,
113+
outputPath?: string,
114+
): Promise<void> {
115+
// Stream to file using fs/promises API
116+
if (outputPath) {
117+
let fileHandle: FileHandle | undefined
118+
119+
try {
120+
fileHandle = await fsp.open(outputPath, 'w')
121+
122+
for (const chunk of chunks) {
123+
await fileHandle.write(chunk)
124+
}
125+
}
126+
finally {
127+
await fileHandle?.close()
128+
}
129+
}
130+
// Stream to stdout
106131
else {
107-
console.log(jsonOutput)
132+
for (const chunk of chunks) {
133+
process.stdout.write(chunk)
134+
}
135+
136+
// Add final newline for stdout
137+
process.stdout.write('\n')
108138
}
109139
}
110140

111141
/**
112142
* Writes TOON lines to a file or stdout using streaming approach.
113143
* Lines are written one at a time without building the full string in memory.
114-
*
115-
* @param lines - Iterable of TOON lines (without trailing newlines)
116-
* @param outputPath - File path to write to, or undefined for stdout
117144
*/
118145
async function writeStreamingToon(
119146
lines: Iterable<string>,
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
/**
2+
* Streaming JSON stringifier.
3+
*
4+
* Yields JSON tokens one at a time, allowing streaming output without holding
5+
* the entire JSON string in memory.
6+
*
7+
* @param value - The value to stringify (must be JSON-serializable)
8+
* @param indent - Number of spaces for indentation (0 = compact, >0 = pretty)
9+
* @returns Generator that yields JSON string chunks
10+
*
11+
* @example
12+
* ```ts
13+
* const data = { name: "Alice", scores: [95, 87, 92] }
14+
* for (const chunk of jsonStringifyLines(data, 2)) {
15+
* process.stdout.write(chunk)
16+
* }
17+
* ```
18+
*/
19+
export function* jsonStringifyLines(
20+
value: unknown,
21+
indent: number = 2,
22+
): Iterable<string> {
23+
yield* stringifyValue(value, 0, indent)
24+
}
25+
26+
/**
27+
* Internal generator for recursive stringification.
28+
*/
29+
function* stringifyValue(
30+
value: unknown,
31+
depth: number,
32+
indent: number,
33+
): Iterable<string> {
34+
// Handle null
35+
if (value === null) {
36+
yield 'null'
37+
return
38+
}
39+
40+
const type = typeof value
41+
42+
// Handle primitives
43+
if (type === 'boolean' || type === 'number') {
44+
yield JSON.stringify(value)
45+
return
46+
}
47+
48+
if (type === 'string') {
49+
yield JSON.stringify(value)
50+
return
51+
}
52+
53+
// Handle arrays
54+
if (Array.isArray(value)) {
55+
yield* stringifyArray(value, depth, indent)
56+
return
57+
}
58+
59+
// Handle objects
60+
if (type === 'object') {
61+
yield* stringifyObject(value as Record<string, unknown>, depth, indent)
62+
return
63+
}
64+
65+
// Undefined, functions, symbols become null in JSON
66+
yield 'null'
67+
}
68+
69+
/**
70+
* Stringify an array with proper formatting.
71+
*/
72+
function* stringifyArray(
73+
arr: unknown[],
74+
depth: number,
75+
indent: number,
76+
): Iterable<string> {
77+
if (arr.length === 0) {
78+
yield '[]'
79+
return
80+
}
81+
82+
yield '['
83+
84+
if (indent > 0) {
85+
// Pretty-printed format
86+
for (let i = 0; i < arr.length; i++) {
87+
yield '\n'
88+
yield ' '.repeat((depth + 1) * indent)
89+
yield* stringifyValue(arr[i], depth + 1, indent)
90+
if (i < arr.length - 1) {
91+
yield ','
92+
}
93+
}
94+
yield '\n'
95+
yield ' '.repeat(depth * indent)
96+
yield ']'
97+
}
98+
else {
99+
// Compact format
100+
for (let i = 0; i < arr.length; i++) {
101+
yield* stringifyValue(arr[i], depth + 1, indent)
102+
if (i < arr.length - 1) {
103+
yield ','
104+
}
105+
}
106+
yield ']'
107+
}
108+
}
109+
110+
/**
111+
* Stringify an object with proper formatting.
112+
*/
113+
function* stringifyObject(
114+
obj: Record<string, unknown>,
115+
depth: number,
116+
indent: number,
117+
): Iterable<string> {
118+
const keys = Object.keys(obj)
119+
120+
if (keys.length === 0) {
121+
yield '{}'
122+
return
123+
}
124+
125+
yield '{'
126+
127+
if (indent > 0) {
128+
// Pretty-printed format
129+
for (let i = 0; i < keys.length; i++) {
130+
const key = keys[i]!
131+
const value = obj[key]
132+
133+
yield '\n'
134+
yield ' '.repeat((depth + 1) * indent)
135+
yield JSON.stringify(key)
136+
yield ': '
137+
yield* stringifyValue(value, depth + 1, indent)
138+
if (i < keys.length - 1) {
139+
yield ','
140+
}
141+
}
142+
yield '\n'
143+
yield ' '.repeat(depth * indent)
144+
yield '}'
145+
}
146+
else {
147+
// Compact format
148+
for (let i = 0; i < keys.length; i++) {
149+
const key = keys[i]!
150+
const value = obj[key]
151+
152+
yield JSON.stringify(key)
153+
yield ':'
154+
yield* stringifyValue(value, depth + 1, indent)
155+
if (i < keys.length - 1) {
156+
yield ','
157+
}
158+
}
159+
yield '}'
160+
}
161+
}

packages/cli/test/index.test.ts

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -153,15 +153,18 @@ describe('toon CLI', () => {
153153

154154
const cleanup = mockStdin(toonInput)
155155

156-
const stdout: string[] = []
157-
vi.spyOn(console, 'log').mockImplementation((message?: unknown) => {
158-
stdout.push(String(message ?? ''))
156+
const writeChunks: string[] = []
157+
vi.spyOn(process.stdout, 'write').mockImplementation((chunk) => {
158+
writeChunks.push(String(chunk))
159+
return true
159160
})
160161

161162
try {
162163
await runCli({ rawArgs: ['--decode'] })
163-
expect(stdout).toHaveLength(1)
164-
const result = JSON.parse(stdout?.at(0) ?? '')
164+
const fullOutput = writeChunks.join('')
165+
// Remove trailing newline before parsing
166+
const jsonOutput = fullOutput.endsWith('\n') ? fullOutput.slice(0, -1) : fullOutput
167+
const result = JSON.parse(jsonOutput)
165168
expect(result).toEqual(data)
166169
}
167170
finally {
@@ -279,16 +282,19 @@ describe('toon CLI', () => {
279282
const toonInput = encode(data)
280283
const cleanup = mockStdin(toonInput)
281284

282-
const stdout: string[] = []
283-
vi.spyOn(console, 'log').mockImplementation((message?: unknown) => {
284-
stdout.push(String(message ?? ''))
285+
const writeChunks: string[] = []
286+
vi.spyOn(process.stdout, 'write').mockImplementation((chunk) => {
287+
writeChunks.push(String(chunk))
288+
return true
285289
})
286290

287291
try {
288292
await runCli({ rawArgs: ['--decode', '--no-strict'] })
289293

290-
expect(stdout).toHaveLength(1)
291-
const result = JSON.parse(stdout?.at(0) ?? '')
294+
const fullOutput = writeChunks.join('')
295+
// Remove trailing newline before parsing
296+
const jsonOutput = fullOutput.endsWith('\n') ? fullOutput.slice(0, -1) : fullOutput
297+
const result = JSON.parse(jsonOutput)
292298
expect(result).toEqual(data)
293299
}
294300
finally {

0 commit comments

Comments
 (0)