-
Notifications
You must be signed in to change notification settings - Fork 707
LargeBinary Migration Doesn't Update Datafusion Schema #11028
Copy link
Copy link
Labels
dataplatformRerun Data Platform integrationRerun Data Platform integration🪳 bugSomething isn't workingSomething isn't working
Description
Describe the bug
If we've already registered a dataset on the cloud with list<list<uint8>> in the schema then when we try to access that column the schema expects the original type but we are returning list<large_binary>
To Reproduce
Simpler repro:
- Using dataplatform update to point to local rerun version
pixi run dev- enter a pixi shell with rerun installed
pixi run -e examples
import rerun as rr
import os
CATALOG_URL = os.environ["REDAP_URI"]
client = rr.catalog.CatalogClient(CATALOG_URL, token=os.getenv('REDAP_TOKEN'))
dataset: rr.catalog.DatasetEntry = client.get_dataset_entry(name="droid:raw")
df = dataset.dataframe_query_view(index="real_time", contents="/thumbnail/camera/wrist").df()
print(df.schema())
bad_result = df.limit(1)
print(bad_result)Steps to reproduce the behavior:
- Launch local notebook from cloud repo and point to existing dataset
- Query a column with
blobtype
>> dataset.dataframe_query_view(index="real_time", contents="/thumbnail/camera/wrist").df().limit(1)
Exception: DataFusion error: Execution("Arrow error: Invalid argument error: column types must match schema types, expected List(Field { name: \"item\", data_type: List(Field { name: \"item\", data_type: UInt8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) but found List(Field { name: \"item\", data_type: LargeBinary, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) at column index 4")Expected behavior
We can query data even if migrating the format.
Rerun version
rerun-cli 0.25.0-alpha.1+dev (base map_view nasm native_viewer oss_server release_no_web_viewer) [rustc 1.88.0 (6b00bc388 2025-06-23), LLVM 20.1.5] aarch64-apple-darwin (debug)
Video features: av1 default ffmpeg nasm serdecommit: 241df80
Related
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataplatformRerun Data Platform integrationRerun Data Platform integration🪳 bugSomething isn't workingSomething isn't working