You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Part of #69
Depends on #90 (streaming spike — completed)
Description
Switch sql-pipe from an in-memory SQLite database to a file-backed temporary database when handling large inputs. This allows datasets larger than available RAM to be processed without crashing, across all supported input formats (CSV, TSV, NDJSON, JSON, XML).
Two changes are required (both are necessary — neither is sufficient alone):
Open a file-backed temp DB instead of sqlite3_open(":memory:", &db) — this is what actually bounds data memory usage
PRAGMA temp_store = FILE — ensures transient structures (ORDER BY sorts, GROUP BY indices) also spill to disk rather than RAM
All SQL operations continue to work correctly. No changes to query semantics or output.
Acceptance Criteria
--disk flag (or --large) switches the main database from :memory: to a file-backed temp DB
PRAGMA temp_store = FILE is set when --disk is active
Temp files are cleaned up after execution (even on error)
Queries that previously ran out of memory on 1GB+ files now complete successfully
Existing behavior unchanged when flag is not set (default stays :memory:)
Error message if temp directory is not writable
Works for all input formats: CSV, TSV, NDJSON, JSON, XML
Notes
Open the temp DB with an empty string path ("") or an explicit OS temp path — SQLite treats "" as a private temp file that is deleted automatically on close
Do NOT use PRAGMA temp_store_directory — it is deprecated; temp location is controlled via the SQLITE_TMPDIR / TMPDIR environment variable
The flag is binary (in-memory vs disk-backed), not a size threshold — there is no continuous SQLite API to "use disk above X MB"
JSON and XML loaders still buffer the raw input file before parsing — this is a separate concern tracked in future sub-issues; --disk reduces SQLite memory pressure but not parser-level buffering for these two formats
Part of #69
Depends on #90 (streaming spike — completed)
Description
Switch sql-pipe from an in-memory SQLite database to a file-backed temporary database when handling large inputs. This allows datasets larger than available RAM to be processed without crashing, across all supported input formats (CSV, TSV, NDJSON, JSON, XML).
Two changes are required (both are necessary — neither is sufficient alone):
sqlite3_open(":memory:", &db)— this is what actually bounds data memory usagePRAGMA temp_store = FILE— ensures transient structures (ORDER BY sorts, GROUP BY indices) also spill to disk rather than RAMAll SQL operations continue to work correctly. No changes to query semantics or output.
Acceptance Criteria
--diskflag (or--large) switches the main database from:memory:to a file-backed temp DBPRAGMA temp_store = FILEis set when--diskis active:memory:)Notes
"") or an explicit OS temp path — SQLite treats""as a private temp file that is deleted automatically on closePRAGMA temp_store_directory— it is deprecated; temp location is controlled via theSQLITE_TMPDIR/TMPDIRenvironment variable--diskreduces SQLite memory pressure but not parser-level buffering for these two formats