Conversation
Replace sql.js (~691KB WASM+JS) with hyparquet-writer (~1KB) for export. Parquet files can be queried with DuckDB, pandas, Polars, etc. Diagnostics values are stored as JSON column instead of separate table. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Node.js Buffer.buffer returns the underlying ArrayBuffer pool which may not start at offset 0. Use slice() to get the exact byte range. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Pass raw arrays to hyparquet-writer with type:'JSON' instead of pre-stringified values, which caused double JSON encoding. DuckDB now correctly reads values_json as a JSON array. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR replaces the SQLite (sql.js) export path with Parquet export using hyparquet-writer, reducing bundle size and simplifying the diagnostics export schema while keeping exports compatible with common analytics tools.
Changes:
- Swapped SQLite export functions for synchronous Parquet exporters (
exportToParquet,exportDiagnosticsToParquet). - Updated UI export buttons/test IDs and adapted unit + Playwright E2E tests to validate Parquet output via
hyparquet. - Updated dependencies and docs to reflect Parquet export support.
Reviewed changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/types/sqljs.d.ts | Removes custom module typings for sql.js after dropping SQLite export. |
| src/rosbagUtils.ts | Removes SQL.js initialization and implements Parquet export for rosout + diagnostics. |
| src/rosbagUtils.test.ts | Updates export unit tests to validate Parquet columns/values using hyparquet reader. |
| src/App.tsx | Replaces SQLite export option/button with Parquet and updates MIME type + filenames. |
| package.json | Removes sql.js, adds hyparquet-writer (prod) and hyparquet (dev). |
| package-lock.json | Lockfile updates reflecting the dependency swap. |
| e2e/parquet-export.spec.ts | Converts E2E coverage from SQLite validation to Parquet validation. |
| README.md | Updates user-facing docs from SQLite export to Parquet export with DuckDB example. |
| CLAUDE.md | Updates internal repo documentation to mention Parquet export. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const buf = parquetWriteBuffer({ | ||
| columnData: [ | ||
| { name: 'timestamp', data: messages.map(m => m.timestamp), type: 'DOUBLE' }, | ||
| { name: 'time_text', data: messages.map(m => formatTimestamp(m.timestamp, timezone)), type: 'STRING' }, | ||
| { name: 'node', data: messages.map(m => m.node), type: 'STRING' }, | ||
| { name: 'severity', data: messages.map(m => m.severity), type: 'STRING' }, | ||
| { name: 'message', data: messages.map(m => m.message), type: 'STRING' }, | ||
| { name: 'file', data: messages.map(m => m.file || ''), type: 'STRING' }, | ||
| { name: 'line', data: messages.map(m => m.line || 0), type: 'INT32' }, | ||
| { name: 'function_name', data: messages.map(m => m.function || ''), type: 'STRING' }, | ||
| { name: 'topics_text', data: messages.map(m => (m.topics || []).join(';')), type: 'STRING' }, |
There was a problem hiding this comment.
parquetWriteBuffer column arrays are built via multiple messages.map(...) calls (one per column), which does 9 full passes over the dataset and allocates many intermediate arrays. For large exports this can become a noticeable CPU/memory hit; consider a single loop that fills per-column arrays (timestamps, nodes, etc.) once and then passes those arrays into columnData.
| const buf = parquetWriteBuffer({ | |
| columnData: [ | |
| { name: 'timestamp', data: messages.map(m => m.timestamp), type: 'DOUBLE' }, | |
| { name: 'time_text', data: messages.map(m => formatTimestamp(m.timestamp, timezone)), type: 'STRING' }, | |
| { name: 'node', data: messages.map(m => m.node), type: 'STRING' }, | |
| { name: 'severity', data: messages.map(m => m.severity), type: 'STRING' }, | |
| { name: 'message', data: messages.map(m => m.message), type: 'STRING' }, | |
| { name: 'file', data: messages.map(m => m.file || ''), type: 'STRING' }, | |
| { name: 'line', data: messages.map(m => m.line || 0), type: 'INT32' }, | |
| { name: 'function_name', data: messages.map(m => m.function || ''), type: 'STRING' }, | |
| { name: 'topics_text', data: messages.map(m => (m.topics || []).join(';')), type: 'STRING' }, | |
| const length = messages.length; | |
| const timestamps = new Array<number>(length); | |
| const timeTexts = new Array<string>(length); | |
| const nodes = new Array<string>(length); | |
| const severities = new Array<SeverityLevel | string>(length); | |
| const messageTexts = new Array<string>(length); | |
| const files = new Array<string>(length); | |
| const lines = new Array<number>(length); | |
| const functionNames = new Array<string>(length); | |
| const topicsTexts = new Array<string>(length); | |
| for (let i = 0; i < length; i++) { | |
| const m = messages[i]; | |
| timestamps[i] = m.timestamp; | |
| timeTexts[i] = formatTimestamp(m.timestamp, timezone); | |
| nodes[i] = m.node; | |
| severities[i] = m.severity; | |
| messageTexts[i] = m.message; | |
| files[i] = m.file || ''; | |
| lines[i] = m.line || 0; | |
| functionNames[i] = m.function || ''; | |
| topicsTexts[i] = (m.topics || []).join(';'); | |
| } | |
| const buf = parquetWriteBuffer({ | |
| columnData: [ | |
| { name: 'timestamp', data: timestamps, type: 'DOUBLE' }, | |
| { name: 'time_text', data: timeTexts, type: 'STRING' }, | |
| { name: 'node', data: nodes, type: 'STRING' }, | |
| { name: 'severity', data: severities, type: 'STRING' }, | |
| { name: 'message', data: messageTexts, type: 'STRING' }, | |
| { name: 'file', data: files, type: 'STRING' }, | |
| { name: 'line', data: lines, type: 'INT32' }, | |
| { name: 'function_name', data: functionNames, type: 'STRING' }, | |
| { name: 'topics_text', data: topicsTexts, type: 'STRING' }, |
There was a problem hiding this comment.
The current declarative style with .map() is intentionally chosen for readability. Given the typical dataset size (thousands to tens of thousands of messages), the overhead of multiple passes is negligible. Keeping as-is.
| columnData: [ | ||
| { name: 'timestamp', data: diagnostics.map(d => d.timestamp), type: 'DOUBLE' }, | ||
| { name: 'time_text', data: diagnostics.map(d => formatTimestamp(d.timestamp, timezone)), type: 'STRING' }, | ||
| { name: 'name', data: diagnostics.map(d => d.name), type: 'STRING' }, | ||
| { name: 'level_code', data: diagnostics.map(d => d.level), type: 'INT32' }, |
There was a problem hiding this comment.
Similar to exportToParquet, this builds each Parquet column with a separate diagnostics.map(...), causing repeated iteration and extra allocations. A single pass that pushes values into preallocated arrays for each column will scale better for large diagnostic sets.
There was a problem hiding this comment.
Same reasoning as above — prioritizing readability over micro-optimization at this scale.
Summary
sql.js(~691KB WASM+JS) withhyparquet-writer(~1KB) for export, significantly reducing bundle sizevalues_jsoncolumn (queryable via DuckDBjson_extract)Changes
sql.js, addhyparquet-writer(production) andhyparquet(dev)exportToSQLite/exportDiagnosticsToSQLite→exportToParquet/exportDiagnosticsToParquet(now synchronous)hyparquetreaderTest plan
npm run lint— zero warningsnpm run test— 42 unit tests passednpm run build— production build successfulnpx playwright test e2e/parquet-export.spec.ts) — 2 tests passed🤖 Generated with Claude Code