Skip to content

Replace SQLite export with Parquet#12

Merged
Tiryoh merged 4 commits intomainfrom
feature/support-parquet
Mar 27, 2026
Merged

Replace SQLite export with Parquet#12
Tiryoh merged 4 commits intomainfrom
feature/support-parquet

Conversation

@Tiryoh
Copy link
Copy Markdown
Owner

@Tiryoh Tiryoh commented Mar 27, 2026

Summary

  • Replace sql.js (~691KB WASM+JS) with hyparquet-writer (~1KB) for export, significantly reducing bundle size
  • Parquet files can be queried with DuckDB, pandas, Polars, Spark, and other Parquet-compatible tools
  • Diagnostics export simplified from 2-table SQLite schema to single table with values_json column (queryable via DuckDB json_extract)

Changes

  • Dependencies: Remove sql.js, add hyparquet-writer (production) and hyparquet (dev)
  • Export functions: exportToSQLite / exportDiagnosticsToSQLiteexportToParquet / exportDiagnosticsToParquet (now synchronous)
  • UI: SQLite button → Parquet button
  • Tests: Unit tests and E2E tests updated to verify Parquet output using hyparquet reader
  • Docs: README and CLAUDE.md updated

Test plan

  • npm run lint — zero warnings
  • npm run test — 42 unit tests passed
  • npm run build — production build successful
  • E2E tests (npx playwright test e2e/parquet-export.spec.ts) — 2 tests passed
  • Manual: load bag file → export Parquet → verify with DuckDB

🤖 Generated with Claude Code

Tiryoh and others added 2 commits March 27, 2026 13:53
Replace sql.js (~691KB WASM+JS) with hyparquet-writer (~1KB) for export.
Parquet files can be queried with DuckDB, pandas, Polars, etc.
Diagnostics values are stored as JSON column instead of separate table.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Node.js Buffer.buffer returns the underlying ArrayBuffer pool which
may not start at offset 0. Use slice() to get the exact byte range.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 27, 2026

Test Results

42 tests  ±0   42 ✅ ±0   0s ⏱️ ±0s
 1 suites ±0    0 💤 ±0 
 1 files   ±0    0 ❌ ±0 

Results for commit 05a3596. ± Comparison against base commit 57fc11c.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 27, 2026

Tiryoh and others added 2 commits March 27, 2026 22:48
Pass raw arrays to hyparquet-writer with type:'JSON' instead of
pre-stringified values, which caused double JSON encoding.
DuckDB now correctly reads values_json as a JSON array.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the SQLite (sql.js) export path with Parquet export using hyparquet-writer, reducing bundle size and simplifying the diagnostics export schema while keeping exports compatible with common analytics tools.

Changes:

  • Swapped SQLite export functions for synchronous Parquet exporters (exportToParquet, exportDiagnosticsToParquet).
  • Updated UI export buttons/test IDs and adapted unit + Playwright E2E tests to validate Parquet output via hyparquet.
  • Updated dependencies and docs to reflect Parquet export support.

Reviewed changes

Copilot reviewed 7 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/types/sqljs.d.ts Removes custom module typings for sql.js after dropping SQLite export.
src/rosbagUtils.ts Removes SQL.js initialization and implements Parquet export for rosout + diagnostics.
src/rosbagUtils.test.ts Updates export unit tests to validate Parquet columns/values using hyparquet reader.
src/App.tsx Replaces SQLite export option/button with Parquet and updates MIME type + filenames.
package.json Removes sql.js, adds hyparquet-writer (prod) and hyparquet (dev).
package-lock.json Lockfile updates reflecting the dependency swap.
e2e/parquet-export.spec.ts Converts E2E coverage from SQLite validation to Parquet validation.
README.md Updates user-facing docs from SQLite export to Parquet export with DuckDB example.
CLAUDE.md Updates internal repo documentation to mention Parquet export.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/rosbagUtils.ts
Comment on lines +447 to +457
const buf = parquetWriteBuffer({
columnData: [
{ name: 'timestamp', data: messages.map(m => m.timestamp), type: 'DOUBLE' },
{ name: 'time_text', data: messages.map(m => formatTimestamp(m.timestamp, timezone)), type: 'STRING' },
{ name: 'node', data: messages.map(m => m.node), type: 'STRING' },
{ name: 'severity', data: messages.map(m => m.severity), type: 'STRING' },
{ name: 'message', data: messages.map(m => m.message), type: 'STRING' },
{ name: 'file', data: messages.map(m => m.file || ''), type: 'STRING' },
{ name: 'line', data: messages.map(m => m.line || 0), type: 'INT32' },
{ name: 'function_name', data: messages.map(m => m.function || ''), type: 'STRING' },
{ name: 'topics_text', data: messages.map(m => (m.topics || []).join(';')), type: 'STRING' },
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parquetWriteBuffer column arrays are built via multiple messages.map(...) calls (one per column), which does 9 full passes over the dataset and allocates many intermediate arrays. For large exports this can become a noticeable CPU/memory hit; consider a single loop that fills per-column arrays (timestamps, nodes, etc.) once and then passes those arrays into columnData.

Suggested change
const buf = parquetWriteBuffer({
columnData: [
{ name: 'timestamp', data: messages.map(m => m.timestamp), type: 'DOUBLE' },
{ name: 'time_text', data: messages.map(m => formatTimestamp(m.timestamp, timezone)), type: 'STRING' },
{ name: 'node', data: messages.map(m => m.node), type: 'STRING' },
{ name: 'severity', data: messages.map(m => m.severity), type: 'STRING' },
{ name: 'message', data: messages.map(m => m.message), type: 'STRING' },
{ name: 'file', data: messages.map(m => m.file || ''), type: 'STRING' },
{ name: 'line', data: messages.map(m => m.line || 0), type: 'INT32' },
{ name: 'function_name', data: messages.map(m => m.function || ''), type: 'STRING' },
{ name: 'topics_text', data: messages.map(m => (m.topics || []).join(';')), type: 'STRING' },
const length = messages.length;
const timestamps = new Array<number>(length);
const timeTexts = new Array<string>(length);
const nodes = new Array<string>(length);
const severities = new Array<SeverityLevel | string>(length);
const messageTexts = new Array<string>(length);
const files = new Array<string>(length);
const lines = new Array<number>(length);
const functionNames = new Array<string>(length);
const topicsTexts = new Array<string>(length);
for (let i = 0; i < length; i++) {
const m = messages[i];
timestamps[i] = m.timestamp;
timeTexts[i] = formatTimestamp(m.timestamp, timezone);
nodes[i] = m.node;
severities[i] = m.severity;
messageTexts[i] = m.message;
files[i] = m.file || '';
lines[i] = m.line || 0;
functionNames[i] = m.function || '';
topicsTexts[i] = (m.topics || []).join(';');
}
const buf = parquetWriteBuffer({
columnData: [
{ name: 'timestamp', data: timestamps, type: 'DOUBLE' },
{ name: 'time_text', data: timeTexts, type: 'STRING' },
{ name: 'node', data: nodes, type: 'STRING' },
{ name: 'severity', data: severities, type: 'STRING' },
{ name: 'message', data: messageTexts, type: 'STRING' },
{ name: 'file', data: files, type: 'STRING' },
{ name: 'line', data: lines, type: 'INT32' },
{ name: 'function_name', data: functionNames, type: 'STRING' },
{ name: 'topics_text', data: topicsTexts, type: 'STRING' },

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

@Tiryoh Tiryoh Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current declarative style with .map() is intentionally chosen for readability. Given the typical dataset size (thousands to tens of thousands of messages), the overhead of multiple passes is negligible. Keeping as-is.

Comment thread src/rosbagUtils.ts
Comment on lines +468 to +472
columnData: [
{ name: 'timestamp', data: diagnostics.map(d => d.timestamp), type: 'DOUBLE' },
{ name: 'time_text', data: diagnostics.map(d => formatTimestamp(d.timestamp, timezone)), type: 'STRING' },
{ name: 'name', data: diagnostics.map(d => d.name), type: 'STRING' },
{ name: 'level_code', data: diagnostics.map(d => d.level), type: 'INT32' },
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to exportToParquet, this builds each Parquet column with a separate diagnostics.map(...), causing repeated iteration and extra allocations. A single pass that pushes values into preallocated arrays for each column will scale better for large diagnostic sets.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

@Tiryoh Tiryoh Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as above — prioritizing readability over micro-optimization at this scale.

@Tiryoh Tiryoh merged commit 9ba90ee into main Mar 27, 2026
9 checks passed
@Tiryoh Tiryoh deleted the feature/support-parquet branch March 27, 2026 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants