Skip to content

Comments

Support RecordBatch with zero columns but non zero row count, add field to RecordBatchOptions (#1536)#1552

Merged
alamb merged 5 commits intoapache:masterfrom
tustvold:empty-record-batch
Apr 12, 2022
Merged

Support RecordBatch with zero columns but non zero row count, add field to RecordBatchOptions (#1536)#1552
alamb merged 5 commits intoapache:masterfrom
tustvold:empty-record-batch

Conversation

@tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #1536

Rationale for this change

See ticket

What changes are included in this PR?

Makes it possible to construct a RecordBatch with no columns, but a non-zero row count. I had a brief search in the codebase and couldn't find anything obvious that this would break, but I guess there is only one way to find out 😄

Are there any user-facing changes?

This technically makes a breaking change to RecordBatchOptions as it wasn't marked as non-exhaustive.

schema: &SchemaRef,
columns: &[ArrayRef],
/// if any validation check fails, otherwise returns the created [`RecordBatch`]
fn try_new_impl(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure why we need a separate method for this, but I avoided changing it to keep the diff down


/// Options that control the behaviour used when creating a [`RecordBatch`].
#[derive(Debug)]
#[non_exhaustive]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new option imo shouldn't be a breaking change, this should give us that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it is time to make these fields non pub and add with_match_field_name(&mut self, val: bool) type accessors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel particularly strongly either way, although have to write and maintain setters and accessors is a bit of a PIA

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although have to write and maintain setters and accessors is a bit of a PIA

The price for backwards compatibility?

Copy link
Contributor Author

@tustvold tustvold Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non_exhaustive should make additive changes non-breaking, which should be good enough hopefully 😀

Happy to change if you feel strongly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly

@alamb alamb added arrow Changes to the arrow crate api-change Changes to the arrow API labels Apr 12, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 👍


if columns.iter().any(|c| c.len() != row_count) {
return Err(ArrowError::InvalidArgumentError(
"all columns in a record batch must have the same length".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error will now also happen when the specified row count doesn't match the array counts (so the wording might be good to update?)

/// ```
pub fn num_rows(&self) -> usize {
self.columns[0].data().len()
self.row_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could avoid adding row_count with something like

self
  .columns
  .get(0)
  .map(|col| col.data.len())
  .unwrap_or(0)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would always return 0 if there are no columns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- I think I was a little confused -- is the idea that the schema also has 0 fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it's basically a way to support projecting no columns and just getting the row count. This is needed for #1537


/// Options that control the behaviour used when creating a [`RecordBatch`].
#[derive(Debug)]
#[non_exhaustive]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it is time to make these fields non pub and add with_match_field_name(&mut self, val: bool) type accessors?


/// Options that control the behaviour used when creating a [`RecordBatch`].
#[derive(Debug)]
#[non_exhaustive]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although have to write and maintain setters and accessors is a bit of a PIA

The price for backwards compatibility?

@alamb alamb merged commit c9549bb into apache:master Apr 12, 2022
@alamb alamb changed the title Create RecordBatch With Non-Zero Row Count But No Columns (#1536) Support RecordBatch with No Columns but Non-Zero Row Count (#1536) Apr 15, 2022
@alamb alamb changed the title Support RecordBatch with No Columns but Non-Zero Row Count (#1536) Support RecordBatch with No Columns but Non-Zero Row Count, add field to RecordBatchOptions (#1536) Apr 15, 2022
@alamb alamb changed the title Support RecordBatch with No Columns but Non-Zero Row Count, add field to RecordBatchOptions (#1536) Support RecordBatch with zero columns but non zero row count, add field to RecordBatchOptions (#1536) Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support RecordBatch with zero columns but non zero row count

2 participants