Skip to content

Dataframe: track and exclude fully empty columns #7615

@jleibs

Description

@jleibs

Context:

Transform3D Archetype logs empty arrays for every component in the Transform3D. Because the of combinatorial nature of Transform3D, for most users several of these columns are empty across all rows. This has both performance implications (see the need for tracking empty transform components here: #7300) .

Proposal:

  • As chunks are inserted into the store, track whether that column contains any data other than NULL or [].
  • Add a field to ComponentColumnDescriptor such as is_empty, indicating this non-empty information. This is similar context to is_static that can help users decide whether they want to include a Column in their selection.
    • This means if you ask for the full schema of the recording, you can see which columns are empty.
  • Add a new QueryExpression param include_empty_columns.
    • If this is set to false, any column which is fully empty will be treated as if it doesn't exist.
    • If you ask for the schema of the VIEW you will not see the empty columns unless this is true.
    • This means these empty columns will not participate in row generation, which should be fine in 99.9% of real use-cases.

Known edge-case:

  • If you set: include_empty_columns=False, but then you query for that column via Select you will get a FULLY NULL column. You will not see rows where that column was logged as empty.

Metadata

Metadata

Assignees

Labels

feat-dataframe-apiEverything related to the dataframe API

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions