ARROW-10121: [C++] Fix emission of new dictionaries in IPC writer#8302
ARROW-10121: [C++] Fix emission of new dictionaries in IPC writer#8302pitrou wants to merge 3 commits intoapache:masterfrom
Conversation
|
@wesm This is a first draft. It is functional (i.e. fixes the underlying issue) but doesn't try to detect deltas. |
cpp/src/arrow/ipc/writer.cc
Outdated
There was a problem hiding this comment.
I assume this will populate last_dictionaries_ entries even when the length of the dictionary is zero? I recall there was some discussion about empty dictionaries (and whether they need to be written at all) at the start of a stream
There was a problem hiding this comment.
Ah, we can skip zero-length dictionaries explicitly, but won't it confuse the IPC reader?
There was a problem hiding this comment.
I'm not sure if skipped dictionaries will work, but they are permitted by the format, so it should probably be investigated in a follow up issue
c414eec to
cd759d0
Compare
|
I still need to add tests for this (I'll do that once #8309 is merged). Detecting and writing delta dictionaries is not implemented, but that's a feature, not a bug, so it may also wait for another PR. |
cd759d0 to
3e5e6df
Compare
When a dictionary changes from the previous batch, we should re-emit it.
3e5e6df to
4e5dac1
Compare
|
I can take a final look this morning. |
cpp/src/arrow/ipc/writer.cc
Outdated
There was a problem hiding this comment.
I'm not sure if skipped dictionaries will work, but they are permitted by the format, so it should probably be investigated in a follow up issue
When a dictionary changes from the previous batch, it is emitted again in the IPC stream.
If this happens when writing the IPC file format, an error is returned.