-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37727][SQL] Show ignored confs & hide warnings for conf already set in SparkSession.builder.getOrCreate #35001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan, @dongjoon-hyun @AngersZhuuuu @yaooqinn @maropu FYI when you find some time to review :-). |
f58f6c3 to
f710487
Compare
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
yaooqinn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I feel this could be a bit confused. Users may thought the value |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @HyukjinKwon .
|
Test build #146522 has finished for PR 35001 at commit
|
|
Test build #146527 has finished for PR 35001 at commit
|
This actually was already there. The problem here is that users, for example, might pass some source specific options that Spark side does not now. Maybe we can filter out Spark's runtime configuration here though. Let me take a look while I am here. |
…y set in SparkSession.builder.getOrCreate
f710487 to
ae4616c
Compare
|
@Ngone51, the current change should address your comment (see 3. in Pr desscription) |
|
Maybe we can have a UT in |
There is actually a bug I guess from switching to log4j 2 (see SPARK-37729). I would prefer to add it after this gets resolved. |
Got it, thanks. |
AngersZhuuuu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Merged to master. |
|
Thanks guys for your reviews. |
|
late lgtm |
|
Test build #146543 has finished for PR 35001 at commit
|
…Session.builder.getOrCreate ### What changes were proposed in this pull request? This PR adds some test cases for #35001. Test cases were not added because there are related bugs but they are fixed now. ### Why are the changes needed? To avoid regressions. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? This PR adds test cases that were manually tested in #35001. Closes #35048 from HyukjinKwon/SPARK-37727-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…Session.builder.getOrCreate ### What changes were proposed in this pull request? This PR adds some test cases for apache#35001. Test cases were not added because there are related bugs but they are fixed now. ### Why are the changes needed? To avoid regressions. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? This PR adds test cases that were manually tested in apache#35001. Closes apache#35048 from HyukjinKwon/SPARK-37727-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
This PR proposes to show ignored configurations and hide the warnings for configurations that are already set when invoking
SparkSession.builder.getOrCreate.Why are the changes needed?
Currently,
SparkSession.builder.getOrCreate()is too noisy even when duplicate configurations are set. Users cannot easily tell which configurations are to fix. See the example below:It is straitforward when there are few configurations but it is difficult for users to figure out when there are too many configurations especially when these configurations are defined in a property file like 'spark-default.conf' maintained separately by system admins in production.
See also #34757 (comment).
Does this PR introduce any user-facing change?
Yes.
Show ignored configurations in debug level logs:
Before:
After:
Do not issue a warning and hide a configuration already explicitly set (with the same value) before.
Before:
After:
Do not issue a warning and hide runtime SQL configurations in debug log:
Before:
After:
Note that there is no behaviour change on session state initialization when configurations are not set. For example:
But the session state initialization can be triggered now for static SQL configurations set after this PR. Previously, it was not triggered. This would not introduce something user-facing or a bug but worth noting it.
For runtime SQL configurations, the session state initialization in this code path was introduced at #15295.
How was this patch tested?
It was manually tested as shown above.