Support RecordBatch with zero columns but non zero row count, add field to RecordBatchOptions (#1536)#1552
Conversation
| schema: &SchemaRef, | ||
| columns: &[ArrayRef], | ||
| /// if any validation check fails, otherwise returns the created [`RecordBatch`] | ||
| fn try_new_impl( |
There was a problem hiding this comment.
I'm not really sure why we need a separate method for this, but I avoided changing it to keep the diff down
|
|
||
| /// Options that control the behaviour used when creating a [`RecordBatch`]. | ||
| #[derive(Debug)] | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
Adding a new option imo shouldn't be a breaking change, this should give us that
There was a problem hiding this comment.
Or maybe it is time to make these fields non pub and add with_match_field_name(&mut self, val: bool) type accessors?
There was a problem hiding this comment.
I don't feel particularly strongly either way, although have to write and maintain setters and accessors is a bit of a PIA
There was a problem hiding this comment.
although have to write and maintain setters and accessors is a bit of a PIA
The price for backwards compatibility?
There was a problem hiding this comment.
non_exhaustive should make additive changes non-breaking, which should be good enough hopefully 😀
Happy to change if you feel strongly
arrow/src/record_batch.rs
Outdated
|
|
||
| if columns.iter().any(|c| c.len() != row_count) { | ||
| return Err(ArrowError::InvalidArgumentError( | ||
| "all columns in a record batch must have the same length".to_string(), |
There was a problem hiding this comment.
This error will now also happen when the specified row count doesn't match the array counts (so the wording might be good to update?)
| /// ``` | ||
| pub fn num_rows(&self) -> usize { | ||
| self.columns[0].data().len() | ||
| self.row_count |
There was a problem hiding this comment.
Maybe we could avoid adding row_count with something like
self
.columns
.get(0)
.map(|col| col.data.len())
.unwrap_or(0)?
There was a problem hiding this comment.
This would always return 0 if there are no columns?
There was a problem hiding this comment.
I see -- I think I was a little confused -- is the idea that the schema also has 0 fields?
There was a problem hiding this comment.
Yup, it's basically a way to support projecting no columns and just getting the row count. This is needed for #1537
|
|
||
| /// Options that control the behaviour used when creating a [`RecordBatch`]. | ||
| #[derive(Debug)] | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
Or maybe it is time to make these fields non pub and add with_match_field_name(&mut self, val: bool) type accessors?
|
|
||
| /// Options that control the behaviour used when creating a [`RecordBatch`]. | ||
| #[derive(Debug)] | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
although have to write and maintain setters and accessors is a bit of a PIA
The price for backwards compatibility?
RecordBatch with No Columns but Non-Zero Row Count (#1536)
RecordBatch with No Columns but Non-Zero Row Count (#1536)RecordBatch with No Columns but Non-Zero Row Count, add field to RecordBatchOptions (#1536)
RecordBatch with No Columns but Non-Zero Row Count, add field to RecordBatchOptions (#1536)RecordBatch with zero columns but non zero row count, add field to RecordBatchOptions (#1536)
Which issue does this PR close?
Closes #1536
Rationale for this change
See ticket
What changes are included in this PR?
Makes it possible to construct a RecordBatch with no columns, but a non-zero row count. I had a brief search in the codebase and couldn't find anything obvious that this would break, but I guess there is only one way to find out 😄
Are there any user-facing changes?
This technically makes a breaking change to
RecordBatchOptionsas it wasn't marked as non-exhaustive.