You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extend sql-pipe beyond CSV to support multiple input and output formats. Users would specify formats with --input-format / --output-format flags, with automatic detection when possible.
This transforms sql-pipe from a CSV-specific tool into a universal data transformation engine — competing with tools like jq, xsv, and mlr for diverse data sources.
Supported formats (proposed)
Format
Input
Output
Notes
CSV
✅
✅
Current default
TSV
✅
✅
Tab-separated, common in bioinformatics
JSON
✅
✅
Array-of-objects and newline-delimited (NDJSON)
Parquet
✅
✅
Columnar format, popular in data engineering
XML
✅
✅
Row-based XML with configurable element names
Example usage
# JSON input, CSV output (default)
cat data.json | sql-pipe --input-format json "SELECT name, age FROM stdin WHERE age > 30"# CSV input, JSON output
cat users.csv | sql-pipe --output-format json "SELECT * FROM stdin LIMIT 5"# Parquet to TSV
cat warehouse.parquet | sql-pipe --input-format parquet --output-format tsv "SELECT product, SUM(qty) FROM stdin GROUP BY product"# Auto-detect input format from file extension
sql-pipe --input data.json "SELECT * FROM stdin"
Auto-detection of input format when reading from files (by extension)
Default remains CSV→CSV for backward compatibility
JSON output produces an array of objects with column names as keys
NDJSON output produces one JSON object per line
Schema/type detection works correctly for all input formats
Error messages are clear when a format is unsupported or data is malformed
Documentation updated with format examples
Recommended split plan
This is a size:xl epic. When scheduled for a sprint, split into these sub-issues:
Format plugin architecture (size:s) — refactor the current CSV reader/writer into a pluggable format interface so new formats can be added incrementally. Add --input-format / --output-format flags that default to csv.
JSON/NDJSON input and output (size:m) — add JSON array-of-objects and newline-delimited JSON support. Builds on JSON output format (--json) #44 (which adds --json as a standalone flag). JSON input requires flattening nested structures — strategy: top-level keys only, nested objects become JSON strings.
TSV input support (size:xs) — TSV output already exists via --tsv; add TSV as an input format option. Trivial since it's just a different delimiter.
Parquet input and output (size:l) — requires a Zig-compatible Parquet library or C bindings (e.g., Apache Arrow). Investigate feasibility before committing.
XML input and output (size:m) — row-based XML with configurable root/row element names.
Dependencies
JSON output format (--json) #44 (JSON output) should be completed first — this issue generalizes --json into --output-format json
Sub-issue ordering: 1 → 2 → 3/4/5 (architecture first, then formats in parallel)
Notes
TSV is trivial (just a different delimiter) and could ship first
JSON/NDJSON requires flattening nested structures into tabular form — define a strategy (e.g., top-level keys only, or dot-notation for nested)
Parquet requires a Zig Parquet reader/writer library or C bindings
Keep the core SQL engine format-agnostic — only the input parser and output formatter change
Description
Extend sql-pipe beyond CSV to support multiple input and output formats. Users would specify formats with
--input-format/--output-formatflags, with automatic detection when possible.This transforms sql-pipe from a CSV-specific tool into a universal data transformation engine — competing with tools like
jq,xsv, andmlrfor diverse data sources.Supported formats (proposed)
Example usage
Acceptance Criteria
--input-format/-Iflag selects input parser (csv, tsv, json, ndjson, parquet, xml)--output-format/-Oflag selects output formatter (csv, tsv, json, ndjson, parquet, xml)Recommended split plan
This is a
size:xlepic. When scheduled for a sprint, split into these sub-issues:size:s) — refactor the current CSV reader/writer into a pluggable format interface so new formats can be added incrementally. Add--input-format/--output-formatflags that default tocsv.size:m) — add JSON array-of-objects and newline-delimited JSON support. Builds on JSON output format (--json) #44 (which adds--jsonas a standalone flag). JSON input requires flattening nested structures — strategy: top-level keys only, nested objects become JSON strings.size:xs) — TSV output already exists via--tsv; add TSV as an input format option. Trivial since it's just a different delimiter.size:l) — requires a Zig-compatible Parquet library or C bindings (e.g., Apache Arrow). Investigate feasibility before committing.size:m) — row-based XML with configurable root/row element names.Dependencies
--jsoninto--output-format jsonNotes