Skip to content

Commit 6c57a14

Browse files
feat: streaming decode functionality with event-based parsing (closes #131)
1 parent 9ebad53 commit 6c57a14

File tree

19 files changed

+2167
-378
lines changed

19 files changed

+2167
-378
lines changed

README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -777,6 +777,46 @@ for (const line of encodeLines(largeData)) {
777777
}
778778
```
779779

780+
**Streaming decode:**
781+
782+
```ts
783+
import { decodeFromLines, decodeStreamSync } from '@toon-format/toon'
784+
785+
// 1. Lines → value (build full JSON value)
786+
const value = decodeFromLines([
787+
'users[2]{id,name}:',
788+
' 1,Alice',
789+
' 2,Bob',
790+
])
791+
// { users: [{ id: 1, name: 'Alice' }, { id: 2, name: 'Bob' }] }
792+
793+
// 2. Lines → events (for custom streaming consumers)
794+
const lines = [
795+
'users[2]{id,name}:',
796+
' 1,Alice',
797+
' 2,Bob',
798+
]
799+
for (const event of decodeStreamSync(lines)) {
800+
// { type: 'startObject' }, { type: 'key', key: 'users' }, ...
801+
}
802+
```
803+
804+
**Async streaming decode:**
805+
806+
```ts
807+
// 3. Async streaming from files or network
808+
import { createReadStream } from 'node:fs'
809+
import { createInterface } from 'node:readline'
810+
import { decodeStream } from '@toon-format/toon'
811+
812+
const fileStream = createReadStream('data.toon', 'utf-8')
813+
const rl = createInterface({ input: fileStream })
814+
815+
for await (const event of decodeStream(rl)) {
816+
// Process events as they arrive
817+
}
818+
```
819+
780820
## Playgrounds
781821

782822
Experiment with TOON format interactively using these community-built tools for token comparison, format conversion, and validation:

docs/cli/index.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -108,19 +108,25 @@ cat data.toon | toon --decode
108108

109109
Both encoding and decoding operations use streaming output, writing incrementally without building the full output string in memory. This makes the CLI efficient for large datasets without requiring additional configuration.
110110

111-
**JSON → TOON (Encode)**
112-
- Streams TOON lines to output
113-
- No full TOON string in memory
111+
**JSON → TOON (Encode)**:
114112

115-
**TOON → JSON (Decode)**
116-
- Streams JSON tokens to output
117-
- No full JSON string in memory
113+
- Streams TOON lines to output.
114+
- No full TOON string in memory.
115+
116+
**TOON → JSON (Decode)**:
117+
118+
- Uses the same event-based streaming decoder as the `decodeStream` API in `@toon-format/toon`.
119+
- Streams JSON tokens to output.
120+
- No full JSON string in memory.
121+
- When `--expand-paths safe` is enabled, falls back to non-streaming decode internally to apply deep-merge expansion before writing JSON.
122+
123+
Process large files with minimal memory usage:
118124

119125
```bash
120-
# Encode large JSON file with minimal memory usage
126+
# Encode large JSON file
121127
toon huge-dataset.json -o output.toon
122128

123-
# Decode large TOON file with minimal memory usage
129+
# Decode large TOON file
124130
toon huge-dataset.toon -o output.json
125131

126132
# Process millions of records efficiently via stdin

docs/guide/getting-started.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,3 +237,5 @@ Round-tripping is lossless: `decode(encode(x))` always equals `x` (after normali
237237
## Where to Go Next
238238

239239
Now that you've seen your first TOON document, read the [Format Overview](/guide/format-overview) for complete syntax details (objects, arrays, quoting rules, key folding), then explore [Using TOON with LLMs](/guide/llm-prompts) to see how to use it effectively in prompts. For implementation details, check the [API reference](/reference/api) (TypeScript) or the [specification](/reference/spec) (language-agnostic normative rules).
240+
241+
For large datasets or streaming use-cases, see `encodeLines`, `decodeFromLines`, and `decodeStream` in the [API reference](/reference/api).

docs/guide/llm-prompts.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,31 @@ toon large-dataset.json --output output.toon
118118

119119
This streaming approach prevents out-of-memory errors when preparing large context windows for LLMs. For complete details on `encodeLines()`, see the [API reference](/reference/api#encodelines).
120120

121+
**Consuming streaming LLM outputs:** If your LLM client exposes streaming text and you buffer by lines, you can decode TOON incrementally:
122+
123+
```ts
124+
import { decodeFromLines } from '@toon-format/toon'
125+
126+
// Buffer streaming response into lines
127+
const lines: string[] = []
128+
let buffer = ''
129+
130+
for await (const chunk of modelStream) {
131+
buffer += chunk
132+
let index: number
133+
134+
while ((index = buffer.indexOf('\n')) !== -1) {
135+
lines.push(buffer.slice(0, index))
136+
buffer = buffer.slice(index + 1)
137+
}
138+
}
139+
140+
// Decode buffered lines
141+
const data = decodeFromLines(lines)
142+
```
143+
144+
For streaming decode APIs, see [`decodeFromLines()`](/reference/api#decodeFromLines-lines-options) and [`decodeStream()`](/reference/api#decodeStream-source-options).
145+
121146
## Tips and Pitfalls
122147

123148
**Show, don't describe.** Don't explain TOON syntax in detail – just show an example. Models learn the pattern from context. A simple code block with 2-5 rows is more effective than paragraphs of explanation.

docs/reference/api.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,227 @@ decode(toon, { expandPaths: 'safe', strict: false })
300300
```
301301
:::
302302

303+
## `decodeFromLines(lines, options?)`
304+
305+
Decodes TOON format from pre-split lines into a JavaScript value. This is a streaming-friendly wrapper around the event-based decoder that builds the full value in memory.
306+
307+
Useful when you already have lines as an array or iterable (e.g., from file streams, readline interfaces, or network responses) and want the standard decode behavior with path expansion support.
308+
309+
### Parameters
310+
311+
| Parameter | Type | Description |
312+
|-----------|------|-------------|
313+
| `lines` | `Iterable<string>` | Iterable of TOON lines (without trailing newlines) |
314+
| `options` | `DecodeOptions?` | Optional decoding configuration (see below) |
315+
316+
### Options
317+
318+
| Option | Type | Default | Description |
319+
|--------|------|---------|-------------|
320+
| `indent` | `number` | `2` | Expected number of spaces per indentation level |
321+
| `strict` | `boolean` | `true` | Enable strict validation (array counts, indentation, delimiter consistency) |
322+
| `expandPaths` | `'off'` \| `'safe'` | `'off'` | Enable path expansion to reconstruct dotted keys into nested objects |
323+
324+
### Return Value
325+
326+
Returns a `JsonValue` (the parsed JavaScript value: object, array, or primitive).
327+
328+
### Example
329+
330+
**Basic usage with arrays:**
331+
332+
```ts
333+
import { decodeFromLines } from '@toon-format/toon'
334+
335+
const lines = ['name: Alice', 'age: 30']
336+
const value = decodeFromLines(lines)
337+
// { name: 'Alice', age: 30 }
338+
```
339+
340+
**Streaming from Node.js readline:**
341+
342+
```ts
343+
import { createReadStream } from 'node:fs'
344+
import { createInterface } from 'node:readline'
345+
import { decodeFromLines } from '@toon-format/toon'
346+
347+
const rl = createInterface({
348+
input: createReadStream('data.toon'),
349+
crlfDelay: Infinity,
350+
})
351+
352+
const value = decodeFromLines(rl)
353+
console.log(value)
354+
```
355+
356+
**With path expansion:**
357+
358+
```ts
359+
const lines = ['user.name: Alice', 'user.age: 30']
360+
const value = decodeFromLines(lines, { expandPaths: 'safe' })
361+
// { user: { name: 'Alice', age: 30 } }
362+
```
363+
364+
## `decodeStreamSync(lines, options?)`
365+
366+
Synchronously decodes TOON lines into a stream of JSON events. This function yields structured events that represent the JSON data model without building the full value tree.
367+
368+
Useful for streaming processing, custom transformations, or memory-efficient parsing of large datasets where you don't need the full value in memory.
369+
370+
::: info Event Streaming
371+
This is a low-level API that returns individual parse events. For most use cases, [`decodeFromLines()`](#decodeFromLines-lines-options) or [`decode()`](#decode-input-options) are more convenient.
372+
373+
Path expansion (`expandPaths: 'safe'`) is **not supported** in streaming mode since it requires the full value tree.
374+
:::
375+
376+
### Parameters
377+
378+
| Parameter | Type | Description |
379+
|-----------|------|-------------|
380+
| `lines` | `Iterable<string>` | Iterable of TOON lines (without trailing newlines) |
381+
| `options` | `DecodeStreamOptions?` | Optional streaming decoding configuration (see below) |
382+
383+
### Options
384+
385+
| Option | Type | Default | Description |
386+
|--------|------|---------|-------------|
387+
| `indent` | `number` | `2` | Expected number of spaces per indentation level |
388+
| `strict` | `boolean` | `true` | Enable strict validation (array counts, indentation, delimiter consistency) |
389+
390+
### Return Value
391+
392+
Returns an `Iterable<JsonStreamEvent>` that yields structured events.
393+
394+
### Event Types
395+
396+
Events represent the structure of the JSON data model:
397+
398+
```ts
399+
type JsonStreamEvent
400+
= | { type: 'startObject' }
401+
| { type: 'endObject' }
402+
| { type: 'startArray' }
403+
| { type: 'endArray' }
404+
| { type: 'key', key: string }
405+
| { type: 'primitive', value: JsonPrimitive }
406+
407+
type JsonPrimitive = string | number | boolean | null
408+
```
409+
410+
### Example
411+
412+
**Basic event streaming:**
413+
414+
```ts
415+
import { decodeStreamSync } from '@toon-format/toon'
416+
417+
const lines = ['name: Alice', 'age: 30']
418+
419+
for (const event of decodeStreamSync(lines)) {
420+
console.log(event)
421+
}
422+
423+
// Output:
424+
// { type: 'startObject' }
425+
// { type: 'key', key: 'name' }
426+
// { type: 'primitive', value: 'Alice' }
427+
// { type: 'key', key: 'age' }
428+
// { type: 'primitive', value: 30 }
429+
// { type: 'endObject' }
430+
```
431+
432+
**Custom processing:**
433+
434+
```ts
435+
import { decodeStreamSync } from '@toon-format/toon'
436+
437+
const lines = ['users[2]{id,name}:', ' 1,Alice', ' 2,Bob']
438+
let userCount = 0
439+
440+
for (const event of decodeStreamSync(lines)) {
441+
if (event.type === 'endObject' && userCount < 2) {
442+
userCount++
443+
console.log(`Processed user ${userCount}`)
444+
}
445+
}
446+
```
447+
448+
## `decodeStream(source, options?)`
449+
450+
Asynchronously decodes TOON lines into a stream of JSON events. This is the async version of [`decodeStreamSync()`](#decodeStreamSync-lines-options), supporting both synchronous and asynchronous iterables.
451+
452+
Useful for processing file streams, network responses, or other async sources where you want to handle data incrementally as it arrives.
453+
454+
### Parameters
455+
456+
| Parameter | Type | Description |
457+
|-----------|------|-------------|
458+
| `source` | `AsyncIterable<string>` \| `Iterable<string>` | Async or sync iterable of TOON lines (without trailing newlines) |
459+
| `options` | `DecodeStreamOptions?` | Optional streaming decoding configuration (see below) |
460+
461+
### Options
462+
463+
| Option | Type | Default | Description |
464+
|--------|------|---------|-------------|
465+
| `indent` | `number` | `2` | Expected number of spaces per indentation level |
466+
| `strict` | `boolean` | `true` | Enable strict validation (array counts, indentation, delimiter consistency) |
467+
468+
### Return Value
469+
470+
Returns an `AsyncIterable<JsonStreamEvent>` that yields structured events asynchronously.
471+
472+
### Example
473+
474+
**Streaming from file:**
475+
476+
```ts
477+
import { createReadStream } from 'node:fs'
478+
import { createInterface } from 'node:readline'
479+
import { decodeStream } from '@toon-format/toon'
480+
481+
const fileStream = createReadStream('data.toon', 'utf-8')
482+
const rl = createInterface({ input: fileStream, crlfDelay: Infinity })
483+
484+
for await (const event of decodeStream(rl)) {
485+
console.log(event)
486+
// Process events as they arrive
487+
}
488+
```
489+
490+
**Processing events incrementally:**
491+
492+
```ts
493+
import { decodeStream } from '@toon-format/toon'
494+
495+
const lines = getAsyncLineSource() // AsyncIterable<string>
496+
497+
for await (const event of decodeStream(lines, { strict: true })) {
498+
if (event.type === 'key' && event.key === 'id') {
499+
// Next event will be the id value
500+
const valueEvent = await decodeStream(lines).next()
501+
if (valueEvent.value?.type === 'primitive') {
502+
console.log('Found ID:', valueEvent.value.value)
503+
}
504+
}
505+
}
506+
```
507+
508+
**Auto-detection of sync/async sources:**
509+
510+
```ts
511+
// Works with sync iterables
512+
const syncLines = ['name: Alice', 'age: 30']
513+
for await (const event of decodeStream(syncLines)) {
514+
console.log(event)
515+
}
516+
517+
// Works with async iterables
518+
const asyncLines = readLinesFromNetwork()
519+
for await (const event of decodeStream(asyncLines)) {
520+
console.log(event)
521+
}
522+
```
523+
303524
## Round-Trip Compatibility
304525

305526
TOON provides lossless round-trips after normalization:

eslint.config.mjs

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,15 @@
11
// @ts-check
22
import antfu from '@antfu/eslint-config'
33

4-
export default antfu().append({
4+
export default antfu({
5+
rules: {
6+
'no-cond-assign': 'off',
7+
},
8+
}).append({
59
files: ['README.md', 'SPEC.md', '**/docs/**/*'],
610
rules: {
7-
'yaml/quotes': 'off',
11+
'import/no-duplicates': 'off',
812
'style/no-tabs': 'off',
13+
'yaml/quotes': 'off',
914
},
1015
})

packages/cli/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,8 +134,9 @@ cat million-records.toon | toon --decode > output.json
134134

135135
**Memory efficiency:**
136136
- **Encode (JSON → TOON)**: Streams TOON lines to output without full string in memory
137-
- **Decode (TOON → JSON)**: Streams JSON tokens to output without full string in memory
137+
- **Decode (TOON → JSON)**: Uses the same event-based streaming decoder as the `decodeStream` API in `@toon-format/toon`, streaming JSON tokens to output without full string in memory
138138
- Peak memory usage scales with data depth, not total size
139+
- When `--expand-paths safe` is enabled, decode falls back to non-streaming mode internally to apply deep-merge expansion before writing JSON
139140

140141
> [!NOTE]
141142
> When using `--stats` with encode, the full output string is kept in memory for token counting. Omit `--stats` for maximum memory efficiency with very large datasets.

0 commit comments

Comments
 (0)