feat: Make FanoutWriter writer configurable#1872
feat: Make FanoutWriter writer configurable#1872jonathanc-n wants to merge 5 commits intoapache:mainfrom
FanoutWriter writer configurable#1872Conversation
|
cc @CTTY |
CTTY
left a comment
There was a problem hiding this comment.
Thanks for working on this! I've left some minor comments
|
|
||
| /// Whether to use `FanoutWriter` for partitioned tables (handles unsorted data). | ||
| /// If false, uses `ClusteredWriter` (requires sorted data, more memory efficient). | ||
| pub const PROPERTY_WRITE_FANOUT_ENABLED: &str = "write.fanout.enabled"; |
There was a problem hiding this comment.
I think write.datafusion.fanout.enabled will be a better property name since Java has
public static final String SPARK_WRITE_PARTITIONED_FANOUT_ENABLED = "write.spark.fanout.enabled";There was a problem hiding this comment.
| pub const PROPERTY_WRITE_FANOUT_ENABLED: &str = "write.fanout.enabled"; | |
| pub const PROPERTY_DATAFUSION_WRITE_FANOUT_ENABLED: &str = "write.datafusion.fanout.enabled"; |
| /// The default format for files. | ||
| pub write_format_default: String, | ||
| /// The target file size for files. | ||
| pub write_target_file_size_bytes: usize, |
There was a problem hiding this comment.
We need to add the property here as well
There was a problem hiding this comment.
I was thinking about this, I think whether or not to place it in the TableProperty struct is currently a bit inconsistent. write_format_default and write_target_file_size_bytes don't seem to be called by table_property.<property> and are manually parsed in write.rs. I'm not sure why they are parsed differently, but should we standardize one way to do it?
The different ways are currently calling try_from or parsing the way it is done in this pull request.
There was a problem hiding this comment.
write_format_default and write_target_file_size_bytes don't seem to be called by table_property. and are manually parsed in write.rs
Yes, this is because table.metadata().properties() is actually &HashMap<String, String>, not a TableProperties instance. There is an issue tracking this: #1878
Once the issue above is fixed, we should definitely start thinking about standardize the way of fetching property.
All of these can be treated as a separate issue imo, and we should still add the new properties to TableProperties struct
There was a problem hiding this comment.
Should be good for another look!
CTTY
left a comment
There was a problem hiding this comment.
Just one minor change, otherwise looks good to me!
|
|
||
| /// Whether to use `FanoutWriter` for partitioned tables (handles unsorted data). | ||
| /// If false, uses `ClusteredWriter` (requires sorted data, more memory efficient). | ||
| pub const PROPERTY_WRITE_FANOUT_ENABLED: &str = "write.fanout.enabled"; |
There was a problem hiding this comment.
| pub const PROPERTY_WRITE_FANOUT_ENABLED: &str = "write.fanout.enabled"; | |
| pub const PROPERTY_DATAFUSION_WRITE_FANOUT_ENABLED: &str = "write.datafusion.fanout.enabled"; |
|
@CTTY I added it and to the default const |
CTTY
left a comment
There was a problem hiding this comment.
LGTM! Looks like the CI is being flaky tho
|
Yes tests seem to fail non deterministically. cc @liurenjie1024 good to go |
| /// The target file size for files. | ||
| pub write_target_file_size_bytes: usize, | ||
| /// Whether to use `FanoutWriter` for partitioned tables. | ||
| pub write_fanout_enabled: bool, |
There was a problem hiding this comment.
| pub write_fanout_enabled: bool, | |
| pub write_datafusion_fanout_enabled: bool, |
| .map_err(|e| { | ||
| Error::new( | ||
| ErrorKind::DataInvalid, | ||
| "Invalid value for write.fanout.enabled, expected 'true' or 'false'", |
There was a problem hiding this comment.
| "Invalid value for write.fanout.enabled, expected 'true' or 'false'", | |
| "Invalid value for {}, expected 'true' or 'false'", WRITE_DATAFUSION_FANOUT_ENABLED |
|
Hi @jonathanc-n , could you address the comments that Renjie left? Otherwise, I'd be happy to drive this PR to finish |
## Which issue does this PR close? - Closes #1834. ## What changes are included in this PR? - Fellow on #1872. ## Are these changes tested? --------- Signed-off-by: StandingMan <[email protected]>
Which issue does this PR close?
What changes are included in this PR?
Adds
PROPERTY_WRITE_FANOUT_ENABLEDtoTablePropertiesI made it
write.fanout.enabledinstead ofwrite.fanout-enabledin case there is anything we want configurable withFanoutWriterin the future.Are these changes tested?
Yes by existing tests.