Unify most of SessionConfig settings into ConfigOptions#4492
Unify most of SessionConfig settings into ConfigOptions#4492alamb merged 4 commits intoapache:masterfrom
SessionConfig settings into ConfigOptions#4492Conversation
|
I need to deal with the fact that target_partitions is not a constant (it is a function of the number of cpu cores) |
|
Actually, I found a better way: cb0f651 |
| .with_file_extension(extension) | ||
| .with_target_partitions(target_partitions) | ||
| .with_collect_stat(ctx.config.collect_statistics); | ||
| .with_collect_stat(ctx.config.collect_statistics()); |
There was a problem hiding this comment.
this illustrates the major API change for DataFusion users
| /// Default schema name for table resolution (not in ConfigOptions | ||
| /// due to `resolve_table_ref` which passes back references) | ||
| default_schema: String, | ||
| /// Whether the default catalog and schema should be created automatically |
There was a problem hiding this comment.
the change in this PR is to remove these fields and move them to ConfigOptions
| map.insert( | ||
| TARGET_PARTITIONS.to_owned(), | ||
| format!("{}", self.target_partitions), | ||
| format!("{}", self.target_partitions()), |
There was a problem hiding this comment.
this whole function should likely be deprecated and instead we should use ConfigOptions directly. However, for this PR I opted to leave it alone (and thus backwards compatible for Ballista) cc @mingmwang
Looking forward to it! |
478c526 to
cb0f651
Compare
| # to a known value that is unlikely to be | ||
| # the real number of cores on a system | ||
| statement ok | ||
| SET datafusion.execution.target_partitions=7 |
|
FYI @mingmwang |
|
@alamb |
| pub const OPT_TARGET_PARTITIONS: &str = "datafusion.execution.target_partitions"; | ||
|
|
||
| /// Configuration option "datafusion.catalog.create_default_catalog_and_schema" | ||
| pub const OPT_CREATE_DEFAULT_CATALOG_AND_SCHEMA: &str = | ||
| "datafusion.catalog.create_default_catalog_and_schema"; | ||
| /// Configuration option "datafusion.catalog.information_schema" | ||
| pub const OPT_INFORMATION_SCHEMA: &str = "datafusion.catalog.information_schema"; | ||
|
|
||
| /// Configuration option "datafusion.optimizer.repartition_joins" | ||
| pub const OPT_REPARTITION_JOINS: &str = "datafusion.optimizer.repartition_joins"; | ||
|
|
||
| /// Configuration option "datafusion.optimizer.repartition_aggregations" | ||
| pub const OPT_REPARTITION_AGGREGATIONS: &str = | ||
| "datafusion.optimizer.repartition_aggregations"; | ||
|
|
||
| /// Configuration option "datafusion.optimizer.repartition_windows" | ||
| pub const OPT_REPARTITION_WINDOWS: &str = "datafusion.optimizer.repartition_windows"; | ||
|
|
||
| /// Configuration option "datafusion.execuction_collect_statistics" | ||
| pub const OPT_COLLECT_STATISTICS: &str = "datafusion.execuction_collect_statistics"; | ||
|
|
||
| /// Configuration option "datafusion.optimizer.filter_null_join_keys" | ||
| pub const OPT_FILTER_NULL_JOIN_KEYS: &str = "datafusion.optimizer.filter_null_join_keys"; |
There was a problem hiding this comment.
Is there any reason we couldn't just make these an enum? Then we could make the set_x and get_x methods type safe.
There was a problem hiding this comment.
🤔 that is an excellent idea -- I don't think there is any reason we could not do so. I will file a follow on ticket to do so
|
Benchmark runs are scheduled for baseline = 2a754f8 and contender = 740a4fa. 740a4fa is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Towards #3887
Also re #4349
Rationale for this change
see #3887
TLDR is
SET <variable>syntaxWhat changes are included in this PR?
Move all
SessionConfigsettings other than the schema / catalog names into theConfigOptionsstructure,Are these changes tested?
Are there any user-facing changes?
Some fields that were
pubnot must be set via awith_method or accessed via an accessor. The number of locations in the DataFusion codebase was quite low, which was a nice surprise to meThe mix of HashMap /
rust struct style / builder styles is somewhat of a bummer. I bet some fancy macro magic could unify the two but I don't know the right incantations