Skip to content

[REVERTED] Support 2GiB+ videos: Encode Blob as arrow Binary#10875

Merged
emilk merged 48 commits intomainfrom
emilk/blob-as-binary
Aug 24, 2025
Merged

[REVERTED] Support 2GiB+ videos: Encode Blob as arrow Binary#10875
emilk merged 48 commits intomainfrom
emilk/blob-as-binary

Conversation

@emilk
Copy link
Copy Markdown
Member

@emilk emilk commented Aug 12, 2025

This changes the encoding of the component column containing bobs from List<List<u8>> to List<Binary>, i.e. BinaryArray or LargeBinaryArray (both are supported!).

Old data is migrated on load.

This adds support for blobs larger than 2 GiB, e.g. large video files.

In the viewer we now support both 32-bit and 64-bit offsets for the binary data (BinaryArray vs LargeBinaryArray).
This is technically a type of datatype generics. Sort of. See #9144

  • The SDK code (Rust, Python, C++) always encoded as 64-bit (supporting >2GiB videos).
  • Legacy blobs are converted from 32-bit List to 64-bit binary

TODO

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Aug 12, 2025

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
527734c https://rerun.io/viewer/pr/10875 +nightly +main

Note: This comment is updated whenever you push a commit.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Aug 12, 2025

Latest documentation preview deployed successfully.

Result Commit Link
527734c https://landing-84hcg2shc-rerun.vercel.app/docs

Note: This comment is updated whenever you push a commit.

@emilk emilk force-pushed the emilk/blob-as-binary branch from 3e42878 to 5aec85b Compare August 12, 2025 16:25
"attr.rust.tuple_struct"
) {
data: [ubyte] (order: 100);
data: [ubyte] (order: 100, "attr.rerun.override_type": "binary");
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key change

@emilk emilk force-pushed the emilk/blob-as-binary branch from d18ad87 to 6a2b085 Compare August 13, 2025 11:19
@emilk emilk added the 🪵 Log & send APIs Affects the user-facing API for all languages label Aug 13, 2025
@emilk emilk force-pushed the emilk/blob-as-binary branch from 6a2b085 to ec623a3 Compare August 13, 2025 11:43
@emilk
Copy link
Copy Markdown
Member Author

emilk commented Aug 13, 2025

@rerun-bot full-check

@github-actions
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new arrow UI never shows the contents of blob, just their size

@emilk emilk force-pushed the emilk/blob-as-binary branch 3 times, most recently from 629b15b to 14bbb97 Compare August 13, 2025 18:45
@emilk
Copy link
Copy Markdown
Member Author

emilk commented Aug 13, 2025

@rerun-bot full-check

@github-actions
Copy link
Copy Markdown
Contributor

@emilk emilk marked this pull request as ready for review August 13, 2025 19:01
@emilk emilk added the do-not-merge Do not merge this PR label Aug 13, 2025
@emilk emilk marked this pull request as draft August 14, 2025 10:55
@emilk
Copy link
Copy Markdown
Member Author

emilk commented Aug 14, 2025

This is a big PR. I'll split it in two.

@emilk emilk force-pushed the emilk/blob-as-binary branch from 14bbb97 to 875ca13 Compare August 14, 2025 12:22
@emilk emilk changed the base branch from main to emilk/misc-binary-improvements August 14, 2025 12:22
@emilk emilk marked this pull request as ready for review August 14, 2025 12:24
emilk added a commit that referenced this pull request Aug 14, 2025
### Related
* Split out from #10875

See commit messages
Base automatically changed from emilk/misc-binary-improvements to main August 14, 2025 20:46
@emilk emilk force-pushed the emilk/blob-as-binary branch from 875ca13 to 1f47c5e Compare August 14, 2025 20:47
@emilk
Copy link
Copy Markdown
Member Author

emilk commented Aug 22, 2025

@rerun-bot full-check

@github-actions
Copy link
Copy Markdown
Contributor

@emilk emilk marked this pull request as ready for review August 22, 2025 08:30
@emilk
Copy link
Copy Markdown
Member Author

emilk commented Aug 22, 2025

@rerun-bot full-check

@github-actions
Copy link
Copy Markdown
Contributor

Started a full build: https://github.com/rerun-io/rerun/actions/runs/17157666898

@emilk emilk merged commit 139dd83 into main Aug 24, 2025
89 of 92 checks passed
@emilk emilk deleted the emilk/blob-as-binary branch August 24, 2025 14:39
emilk added a commit that referenced this pull request Aug 28, 2025
andrea-reale pushed a commit that referenced this pull request Aug 29, 2025
### Related
* Reverts #10875
* Reverts #11005
* Fixes #11028
* Closes #11032
* Re-opens #10929
* Re-opens #10973

### What
Reverts the the change from `List<u8>` to `Binary` because of Rerun
Cloud migration woes.

Hopefully temporarily.
andrea-reale pushed a commit that referenced this pull request Sep 2, 2025
### Related
* Reverts #10875
* Reverts #11005
* Fixes #11028
* Closes #11032
* Re-opens #10929
* Re-opens #10973

### What
Reverts the the change from `List<u8>` to `Binary` because of Rerun
Cloud migration woes.

Hopefully temporarily.
andrea-reale added a commit that referenced this pull request Sep 2, 2025
(cherry picking from #11039)

### Related
* Reverts #10875
* Reverts #11005
* Fixes #11028
* Closes #11032
* Re-opens #10929
* Re-opens #10973

### What
Reverts the the change from `List<u8>` to `Binary` because of Rerun
Cloud migration woes.

Hopefully temporarily.

Co-authored-by: Emil Ernerfeldt <[email protected]>
@emilk emilk changed the title Support 2GiB+ videos: Encode Blob as arrow Binary [REVERTED] Support 2GiB+ videos: Encode Blob as arrow Binary Sep 2, 2025
@emilk emilk added exclude from changelog PRs with this won't show up in CHANGELOG.md and removed include in changelog labels Sep 2, 2025
emilk added a commit that referenced this pull request Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔩 data model Sorbet exclude from changelog PRs with this won't show up in CHANGELOG.md 🪵 Log & send APIs Affects the user-facing API for all languages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

videos larger than 2.1GB cause rerun to crash Allow logging >2GiB binary blobs in C++

3 participants