Add option to FlightDataEncoder to always resend batch dictionaries#4896
Add option to FlightDataEncoder to always resend batch dictionaries#4896alamb merged 4 commits intoapache:masterfrom
FlightDataEncoder to always resend batch dictionaries#4896Conversation
|
Thank you for this PR @alexwilcoxson-rel -- I hope to find time to review it properly in the next few days |
There was a problem hiding this comment.
Thank you @alexwilcoxson-rel -- I think this is a really nice improvement and it could be merged as is.
I was thinking to a future when we might have proper support for resending dictionaries and trying to minimize the API impact as well as adding a nice place for more documentation. What do you think about using an enum instead of a bool like this:
So instead of
pub struct FlightDataEncoder {
...
send_dictionaries: bool,
...
}Something like
pub struct FlightDataEncoder {
...
dictionary_handling: DictionaryHandling,
...
}
/// Defines how a [`FlightDataEncoder`] encodes [`DictionaryArray`]s
pub enum DictionaryHandling {
/// Expands to the underlying type (default). This likely sends more data over the network
/// but requires less memory (dictionaries are not tracked) and is more compatible
/// with other arrow flight client implementations that may not support `DictionaryEncoding`
/// see [`hydrate_dictionary`] for more details.
Hydrate,
/// Send dictionary FlightData with every RecordBatch that contains a [`DictionaryArray`].
/// See [`Self::Hydrate`] for more tradeoffs. No attempt is made to skip sending the same (logical)
/// dictionary values twice.
Resend,
}| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_dictionary_hydration() { |
|
|
||
| let arr_one: Arc<DictionaryArray<UInt16Type>> = | ||
| Arc::new(vec!["a", "a", "b"].into_iter().collect()); | ||
| let arr_two: Arc<DictionaryArray<UInt16Type>> = |
|
BTW it looks like some documentation also needs to be updated, like arrow-rs/arrow-flight/src/encode.rs Line 33 in 23db567 I can handle doing so, but wanted to mention it in case @alexwilcoxson-rel was going to change this one more |
Updated these docs |
|
Updated to use Enum per @alamb feedback |
alamb
left a comment
There was a problem hiding this comment.
Thanks @alexwilcoxson-rel -- this looks great
|
The CI seems to be failing the doc checks. I'll try and fix that |
|
Here is a proposed PR to your branch for improved documentation relativityone#1 🙏 |
Improve docs for sending flight dictionaries
LGTM, merged! please rerun CI when you can. |
FlightDataEncoder to always resend batch dictionaries
|
Thanks again @alexwilcoxson-rel |
Which issue does this PR close?
Closes #4895
Rationale for this change
see #4895
What changes are included in this PR?
Adds
with_send_dictionariestoFlightDataEncoderBuilderto opt in to always sending dictionary messages with each record batch encoded.Are there any user-facing changes?
with_send_dictionariesadded toFlightDataEncoderBuilderandFlightDataEncoder. Doc strings updated.