-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17720][SQL] introduce static SQL conf #15295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #66094 has finished for PR 15295 at commit
|
ea2a57f to
59fc4e9
Compare
| var session = activeThreadSession.get() | ||
| if ((session ne null) && !session.sparkContext.isStopped) { | ||
| options.foreach { case (k, v) => session.conf.set(k, v) } | ||
| options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is change related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, the goal is to forbid users to set/unset global sql conf via public API, but here we do need to set global sql conf with sparkConf, so we should use internal API here instead.
|
LGTM. Pretty straightforward. |
|
Test build #66102 has finished for PR 15295 at commit
|
|
retest this please |
|
Test build #66109 has finished for PR 15295 at commit
|
|
retest this please |
|
Test build #66119 has finished for PR 15295 at commit
|
|
|
||
| test("cannot set/unset global SQL conf") { | ||
| intercept[AnalysisException](sql(s"SET ${SCHEMA_STRING_LENGTH_THRESHOLD.key}=10")) | ||
| intercept[AnalysisException](sql(s"UNSET ${SCHEMA_STRING_LENGTH_THRESHOLD.key}=10")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not support UNSET command in SQL interface.
Maybe you can change it to
spark.conf.unset(SCHEMA_STRING_LENGTH_THRESHOLD.key)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh damn, but why test passed... it should throw parser exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseException extends AnalysisException
| * @since 2.0.0 | ||
| */ | ||
| def set(key: String, value: String): Unit = { | ||
| assertNotGlobalSQLConf(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we need to add this assertNotGlobalSQLConf in all the public set functions? I am just afraid we might forget to update the other set functions when adding more global sqlconf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all global sql conf will be automatically registered, we don't need to change the code here when adding more global sql conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I did not explain it clearly. What I meant above is to add assertNotGlobalSQLConf to the other two set functions Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, looks safer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be assertNonStaticConf here
| .checkValues(Set("hive", "in-memory")) | ||
| .createWithDefault("in-memory") | ||
|
|
||
| // This is used to control the when we will split a schema's JSON string to multiple pieces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add max length
|
"Global" isn't the right name. The main property is immutability once the service starts, rather than the fact that they are shared by all the sessions. |
| } | ||
|
|
||
| test("SPARK-6024 wide schema support") { | ||
| withSQLConf(SQLConf.SCHEMA_STRING_LENGTH_THRESHOLD.key -> "4000") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line in the old test case is useless... : (
|
@srinathshankar suggested "static config" |
|
|
||
| private def assertNotGlobalSQLConf(key: String): Unit = { | ||
| if (GlobalSQLConf.globalConfKeys.contains(key)) { | ||
| throw new AnalysisException(s"Can not set/unset a global SQL conf: $key") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot modify the value of a static config: $key
|
Test build #66147 has finished for PR 15295 at commit
|
|
Test build #66218 has finished for PR 15295 at commit
|
|
Test build #66264 has finished for PR 15295 at commit
|
|
Test build #66640 has finished for PR 15295 at commit
|
fc78834 to
b4c1e0f
Compare
|
Test build #66660 has finished for PR 15295 at commit
|
b4c1e0f to
8d93c4a
Compare
|
Test build #66697 has finished for PR 15295 at commit
|
|
retest this please |
|
Test build #66706 has finished for PR 15295 at commit
|
8d93c4a to
0ad8815
Compare
|
Test build #66722 has finished for PR 15295 at commit
|
| sqlConf.contains(key) | ||
| } | ||
|
|
||
| private def assertNotGlobalSQLConf(key: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to requireNonStaticConf?
|
LGTM except that one comment on naming. |
|
Test build #66777 has finished for PR 15295 at commit
|
|
LGTM |
|
Merging to master! Thanks! |
## What changes were proposed in this pull request? SQLConf is session-scoped and mutable. However, we do have the requirement for a static SQL conf, which is global and immutable, e.g. the `schemaStringThreshold` in `HiveExternalCatalog`, the flag to enable/disable hive support, the global temp view database in apache#14897. Actually we've already implemented static SQL conf implicitly via `SparkConf`, this PR just make it explicit and expose it to users, so that they can see the config value via SQL command or `SparkSession.conf`, and forbid users to set/unset static SQL conf. ## How was this patch tested? new tests in SQLConfSuite Author: Wenchen Fan <[email protected]> Closes apache#15295 from cloud-fan/global-conf.
…y set in SparkSession.builder.getOrCreate
### What changes were proposed in this pull request?
This PR proposes to show ignored configurations and hide the warnings for configurations that are already set when invoking `SparkSession.builder.getOrCreate`.
### Why are the changes needed?
Currently, `SparkSession.builder.getOrCreate()` is too noisy even when duplicate configurations are set. Users cannot easily tell which configurations are to fix. See the example below:
```bash
./bin/spark-shell --conf spark.abc=abc
```
```scala
import org.apache.spark.sql.SparkSession
spark.sparkContext.setLogLevel("DEBUG")
SparkSession.builder.config("spark.abc", "abc").getOrCreate
```
```
21:04:01.670 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; some spark core configurations may not take effect.
```
It is straitforward when there are few configurations but it is difficult for users to figure out when there are too many configurations especially when these configurations are defined in a property file like 'spark-default.conf' maintained separately by system admins in production.
See also #34757 (comment).
### Does this PR introduce _any_ user-facing change?
Yes.
1. Show ignored configurations in debug level logs:
```bash
./bin/spark-shell --conf spark.abc=abc
```
```scala
import org.apache.spark.sql.SparkSession
spark.sparkContext.setLogLevel("DEBUG")
SparkSession.builder
.config("spark.sql.warehouse.dir", "2")
.config("spark.abc", "abcb")
.config("spark.abcd", "abcb4")
.getOrCreate
```
**Before:**
```
21:13:28.360 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; the static sql configurations will not take effect.
21:13:28.360 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; some spark core configurations may not take effect.
```
**After**:
```
20:34:30.619 [main] WARN org.apache.spark.sql.SparkSession - Using an existing Spark session; only runtime SQL configurations will take effect.
20:34:30.622 [main] DEBUG org.apache.spark.sql.SparkSession - Ignored static SQL configurations:
spark.sql.warehouse.dir=2
20:34:30.623 [main] DEBUG org.apache.spark.sql.SparkSession - Configurations that might not take effect:
spark.abcd=abcb4
spark.abc=abcb
```
2. Do not issue a warning and hide a configuration already explicitly set (with the same value) before.
```bash
./bin/spark-shell --conf spark.abc=abc
```
```scala
import org.apache.spark.sql.SparkSession
spark.sparkContext.setLogLevel("DEBUG")
SparkSession.builder.config("spark.abc", "abc").getOrCreate // **Ignore** warnings because it's already set in --conf
SparkSession.builder.config("spark.abc.new", "abc").getOrCreate // **Show** warnings for only configuration newly set.
SparkSession.builder.config("spark.abc.new", "abc").getOrCreate // **Ignore** warnings because it's set ^.
```
**Before**:
```
21:13:56.183 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; some spark core configurations may not take effect.
21:13:56.356 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; some spark core configurations may not take effect.
21:13:56.476 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; some spark core configurations may not take effect.
```
**After:**
```
20:36:36.251 [main] WARN org.apache.spark.sql.SparkSession - Using an existing Spark session; only runtime SQL configurations will take effect.
20:36:36.253 [main] DEBUG org.apache.spark.sql.SparkSession - Configurations that might not take effect:
spark.abc.new=abc
```
3. Do not issue a warning and hide runtime SQL configurations in debug log:
```bash
./bin/spark-shell
```
```scala
import org.apache.spark.sql.SparkSession
spark.sparkContext.setLogLevel("DEBUG")
SparkSession.builder.config("spark.sql.ansi.enabled", "true").getOrCreate // **Ignore** warnings for runtime SQL configurations
SparkSession.builder.config("spark.buffer.size", "1234").getOrCreate // **Show** warnings for Spark core configuration
SparkSession.builder.config("spark.sql.source.specific", "abc").getOrCreate // **Show** warnings for custom runtime options
SparkSession.builder.config("spark.sql.warehouse.dir", "xyz").getOrCreate // **Show** warnings for static SQL configurations
```
**Before**:
```
11:11:40.846 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; some spark core configurations may not take effect.
11:11:41.037 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; some spark core configurations may not take effect.
11:11:41.167 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; some spark core configurations may not take effect.
11:11:41.318 [main] WARN org.apache.spark.sql.SparkSession - Using an existing SparkSession; the static sql configurations will not take effect.
```
**After**:
```
10:39:54.870 [main] WARN org.apache.spark.sql.SparkSession - Using an existing Spark session; only runtime SQL configurations will take effect.
10:39:54.872 [main] DEBUG org.apache.spark.sql.SparkSession - Configurations that might not take effect:
spark.buffer.size=1234
10:39:54.988 [main] WARN org.apache.spark.sql.SparkSession - Using an existing Spark session; only runtime SQL configurations will take effect.
10:39:54.988 [main] DEBUG org.apache.spark.sql.SparkSession - Configurations that might not take effect:
spark.sql.source.specific=abc
10:39:55.107 [main] WARN org.apache.spark.sql.SparkSession - Using an existing Spark session; only runtime SQL configurations will take effect.
10:39:55.108 [main] DEBUG org.apache.spark.sql.SparkSession - Ignored static SQL configurations:
spark.sql.warehouse.dir=xyz
```
Note that there is no behaviour change on session state initialization when configurations are not set. For example:
```scala
import org.apache.spark.sql.SparkSession
spark.sparkContext.setLogLevel("DEBUG")
SparkSession.builder.getOrCreate
```
But the session state initialization can be triggered now for static SQL configurations set after this PR. Previously, it was not triggered. This would not introduce something user-facing or a bug but worth noting it.
For runtime SQL configurations, the session state initialization in this code path was introduced at #15295.
### How was this patch tested?
It was manually tested as shown above.
Closes #35001 from HyukjinKwon/SPARK-37727.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
SQLConf is session-scoped and mutable. However, we do have the requirement for a static SQL conf, which is global and immutable, e.g. the
schemaStringThresholdinHiveExternalCatalog, the flag to enable/disable hive support, the global temp view database in #14897.Actually we've already implemented static SQL conf implicitly via
SparkConf, this PR just make it explicit and expose it to users, so that they can see the config value via SQL command orSparkSession.conf, and forbid users to set/unset static SQL conf.How was this patch tested?
new tests in SQLConfSuite