Replace SQLite export with Parquet by Tiryoh · Pull Request #12 · Tiryoh/rosbag-analyzer-web

Tiryoh · 2026-03-27T12:00:39Z

Summary

Replace sql.js (~691KB WASM+JS) with hyparquet-writer (~1KB) for export, significantly reducing bundle size
Parquet files can be queried with DuckDB, pandas, Polars, Spark, and other Parquet-compatible tools
Diagnostics export simplified from 2-table SQLite schema to single table with values_json column (queryable via DuckDB json_extract)

Changes

Dependencies: Remove sql.js, add hyparquet-writer (production) and hyparquet (dev)
Export functions: exportToSQLite / exportDiagnosticsToSQLite → exportToParquet / exportDiagnosticsToParquet (now synchronous)
UI: SQLite button → Parquet button
Tests: Unit tests and E2E tests updated to verify Parquet output using hyparquet reader
Docs: README and CLAUDE.md updated

Test plan

npm run lint — zero warnings
npm run test — 42 unit tests passed
npm run build — production build successful
E2E tests (npx playwright test e2e/parquet-export.spec.ts) — 2 tests passed
Manual: load bag file → export Parquet → verify with DuckDB

🤖 Generated with Claude Code

Replace sql.js (~691KB WASM+JS) with hyparquet-writer (~1KB) for export. Parquet files can be queried with DuckDB, pandas, Polars, etc. Diagnostics values are stored as JSON column instead of separate table. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Node.js Buffer.buffer returns the underlying ArrayBuffer pool which may not start at offset 0. Use slice() to get the exact byte range. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

github-actions · 2026-03-27T12:01:04Z

Test Results

42 tests ±0 42 ✅ ±0 0s ⏱️ ±0s
1 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit 05a3596. ± Comparison against base commit 57fc11c.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-27T12:02:11Z

🚀 Preview deployed: https://01e477d9.rosbag-analyzer.pages.dev
🔗 Alias URL: https://feature-support-parquet.rosbag-analyzer.pages.dev

Pass raw arrays to hyparquet-writer with type:'JSON' instead of pre-stringified values, which caused double JSON encoding. DuckDB now correctly reads values_json as a JSON array. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Copilot

Pull request overview

This PR replaces the SQLite (sql.js) export path with Parquet export using hyparquet-writer, reducing bundle size and simplifying the diagnostics export schema while keeping exports compatible with common analytics tools.

Changes:

Swapped SQLite export functions for synchronous Parquet exporters (exportToParquet, exportDiagnosticsToParquet).
Updated UI export buttons/test IDs and adapted unit + Playwright E2E tests to validate Parquet output via hyparquet.
Updated dependencies and docs to reflect Parquet export support.

Reviewed changes

Copilot reviewed 7 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/types/sqljs.d.ts	Removes custom module typings for `sql.js` after dropping SQLite export.
src/rosbagUtils.ts	Removes SQL.js initialization and implements Parquet export for rosout + diagnostics.
src/rosbagUtils.test.ts	Updates export unit tests to validate Parquet columns/values using `hyparquet` reader.
src/App.tsx	Replaces SQLite export option/button with Parquet and updates MIME type + filenames.
package.json	Removes `sql.js`, adds `hyparquet-writer` (prod) and `hyparquet` (dev).
package-lock.json	Lockfile updates reflecting the dependency swap.
e2e/parquet-export.spec.ts	Converts E2E coverage from SQLite validation to Parquet validation.
README.md	Updates user-facing docs from SQLite export to Parquet export with DuckDB example.
CLAUDE.md	Updates internal repo documentation to mention Parquet export.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T14:00:04Z

+  const buf = parquetWriteBuffer({
+    columnData: [
+      { name: 'timestamp', data: messages.map(m => m.timestamp), type: 'DOUBLE' },
+      { name: 'time_text', data: messages.map(m => formatTimestamp(m.timestamp, timezone)), type: 'STRING' },
+      { name: 'node', data: messages.map(m => m.node), type: 'STRING' },
+      { name: 'severity', data: messages.map(m => m.severity), type: 'STRING' },
+      { name: 'message', data: messages.map(m => m.message), type: 'STRING' },
+      { name: 'file', data: messages.map(m => m.file || ''), type: 'STRING' },
+      { name: 'line', data: messages.map(m => m.line || 0), type: 'INT32' },
+      { name: 'function_name', data: messages.map(m => m.function || ''), type: 'STRING' },
+      { name: 'topics_text', data: messages.map(m => (m.topics || []).join(';')), type: 'STRING' },


parquetWriteBuffer column arrays are built via multiple messages.map(...) calls (one per column), which does 9 full passes over the dataset and allocates many intermediate arrays. For large exports this can become a noticeable CPU/memory hit; consider a single loop that fills per-column arrays (timestamps, nodes, etc.) once and then passes those arrays into columnData.

Suggested change

const buf = parquetWriteBuffer({

columnData: [

{ name: 'timestamp', data: messages.map(m => m.timestamp), type: 'DOUBLE' },

{ name: 'time_text', data: messages.map(m => formatTimestamp(m.timestamp, timezone)), type: 'STRING' },

{ name: 'node', data: messages.map(m => m.node), type: 'STRING' },

{ name: 'severity', data: messages.map(m => m.severity), type: 'STRING' },

{ name: 'message', data: messages.map(m => m.message), type: 'STRING' },

{ name: 'file', data: messages.map(m => m.file || ''), type: 'STRING' },

{ name: 'line', data: messages.map(m => m.line || 0), type: 'INT32' },

{ name: 'function_name', data: messages.map(m => m.function || ''), type: 'STRING' },

{ name: 'topics_text', data: messages.map(m => (m.topics || []).join(';')), type: 'STRING' },

const length = messages.length;

const timestamps = new Array<number>(length);

const timeTexts = new Array<string>(length);

const nodes = new Array<string>(length);

const severities = new Array<SeverityLevel | string>(length);

const messageTexts = new Array<string>(length);

const files = new Array<string>(length);

const lines = new Array<number>(length);

const functionNames = new Array<string>(length);

const topicsTexts = new Array<string>(length);

for (let i = 0; i < length; i++) {

const m = messages[i];

timestamps[i] = m.timestamp;

timeTexts[i] = formatTimestamp(m.timestamp, timezone);

nodes[i] = m.node;

severities[i] = m.severity;

messageTexts[i] = m.message;

files[i] = m.file || '';

lines[i] = m.line || 0;

functionNames[i] = m.function || '';

topicsTexts[i] = (m.topics || []).join(';');

}

const buf = parquetWriteBuffer({

columnData: [

{ name: 'timestamp', data: timestamps, type: 'DOUBLE' },

{ name: 'time_text', data: timeTexts, type: 'STRING' },

{ name: 'node', data: nodes, type: 'STRING' },

{ name: 'severity', data: severities, type: 'STRING' },

{ name: 'message', data: messageTexts, type: 'STRING' },

{ name: 'file', data: files, type: 'STRING' },

{ name: 'line', data: lines, type: 'INT32' },

{ name: 'function_name', data: functionNames, type: 'STRING' },

{ name: 'topics_text', data: topicsTexts, type: 'STRING' },

The current declarative style with .map() is intentionally chosen for readability. Given the typical dataset size (thousands to tens of thousands of messages), the overhead of multiple passes is negligible. Keeping as-is.

Copilot · 2026-03-27T14:00:04Z

+    columnData: [
+      { name: 'timestamp', data: diagnostics.map(d => d.timestamp), type: 'DOUBLE' },
+      { name: 'time_text', data: diagnostics.map(d => formatTimestamp(d.timestamp, timezone)), type: 'STRING' },
+      { name: 'name', data: diagnostics.map(d => d.name), type: 'STRING' },
+      { name: 'level_code', data: diagnostics.map(d => d.level), type: 'INT32' },


Similar to exportToParquet, this builds each Parquet column with a separate diagnostics.map(...), causing repeated iteration and extra allocations. A single pass that pushes values into preallocated arrays for each column will scale better for large diagnostic sets.

Same reasoning as above — prioritizing readability over micro-optimization at this scale.

Tiryoh and others added 2 commits March 27, 2026 13:53

Fix ArrayBuffer slicing for Parquet reader in tests

c55059c

Node.js Buffer.buffer returns the underlying ArrayBuffer pool which may not start at offset 0. Use slice() to get the exact byte range. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Tiryoh and others added 2 commits March 27, 2026 22:48

Add DuckDB query example to README

05a3596

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Tiryoh requested a review from Copilot March 27, 2026 13:52

Copilot started reviewing on behalf of Tiryoh March 27, 2026 13:52 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Tiryoh merged commit 9ba90ee into main Mar 27, 2026
9 checks passed

Tiryoh deleted the feature/support-parquet branch March 27, 2026 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace SQLite export with Parquet#12

Replace SQLite export with Parquet#12
Tiryoh merged 4 commits intomainfrom
feature/support-parquet

Tiryoh commented Mar 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Tiryoh Mar 27, 2026 •

edited

Loading

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Tiryoh Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-  const buf = parquetWriteBuffer({
-    columnData: [
-      { name: 'timestamp', data: messages.map(m => m.timestamp), type: 'DOUBLE' },
-      { name: 'time_text', data: messages.map(m => formatTimestamp(m.timestamp, timezone)), type: 'STRING' },
-      { name: 'node', data: messages.map(m => m.node), type: 'STRING' },
-      { name: 'severity', data: messages.map(m => m.severity), type: 'STRING' },
-      { name: 'message', data: messages.map(m => m.message), type: 'STRING' },
-      { name: 'file', data: messages.map(m => m.file || ''), type: 'STRING' },
-      { name: 'line', data: messages.map(m => m.line || 0), type: 'INT32' },
-      { name: 'function_name', data: messages.map(m => m.function || ''), type: 'STRING' },
-      { name: 'topics_text', data: messages.map(m => (m.topics || []).join(';')), type: 'STRING' },
+  const length = messages.length;
+  const timestamps = new Array<number>(length);
+  const timeTexts = new Array<string>(length);
+  const nodes = new Array<string>(length);
+  const severities = new Array<SeverityLevel | string>(length);
+  const messageTexts = new Array<string>(length);
+  const files = new Array<string>(length);
+  const lines = new Array<number>(length);
+  const functionNames = new Array<string>(length);
+  const topicsTexts = new Array<string>(length);
+  for (let i = 0; i < length; i++) {
+    const m = messages[i];
+    timestamps[i] = m.timestamp;
+    timeTexts[i] = formatTimestamp(m.timestamp, timezone);
+    nodes[i] = m.node;
+    severities[i] = m.severity;
+    messageTexts[i] = m.message;
+    files[i] = m.file || '';
+    lines[i] = m.line || 0;
+    functionNames[i] = m.function || '';
+    topicsTexts[i] = (m.topics || []).join(';');
+  }
+  const buf = parquetWriteBuffer({
+    columnData: [
+      { name: 'timestamp', data: timestamps, type: 'DOUBLE' },
+      { name: 'time_text', data: timeTexts, type: 'STRING' },
+      { name: 'node', data: nodes, type: 'STRING' },
+      { name: 'severity', data: severities, type: 'STRING' },
+      { name: 'message', data: messageTexts, type: 'STRING' },
+      { name: 'file', data: files, type: 'STRING' },
+      { name: 'line', data: lines, type: 'INT32' },
+      { name: 'function_name', data: functionNames, type: 'STRING' },
+      { name: 'topics_text', data: topicsTexts, type: 'STRING' },

Conversation

Tiryoh commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

github-actions Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

github-actions Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Tiryoh Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Tiryoh Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tiryoh commented Mar 27, 2026 •

edited

Loading

github-actions Bot commented Mar 27, 2026 •

edited

Loading

github-actions Bot commented Mar 27, 2026 •

edited

Loading

Tiryoh Mar 27, 2026 •

edited

Loading

Tiryoh Mar 27, 2026 •

edited

Loading