-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37638][PYTHON] Use existing active Spark session instead of SparkSession.getOrCreate in pandas API on Spark #34893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… in pandas API on Spark
|
cc @xinrong-databricks and @ueshin FYI |
|
Test build #146171 has finished for PR 34893 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| builder = builder.config(key, value) | ||
| # Currently, pandas-on-Spark is dependent on such join due to 'compute.ops_on_diff_frames' | ||
| # configuration. This is needed with Spark 3.0+. | ||
| builder.config("spark.sql.analyzer.failAmbiguousSelfJoin", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we fixed this bug in the master branch so we don't need to set this anymore
| builder.config("spark.sql.analyzer.failAmbiguousSelfJoin", False) | ||
|
|
||
| if is_testing(): | ||
| builder.config("spark.executor.allowSparkContext", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this will be set separately when SparkContext is created, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and in fact this was set when we run tests with pytest when it was in Koalas repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this was added for our test to check our code doesn't create SparkContext in executors.
But we can remove it anyway because the default value is False now.
|
Kubernetes integration test starting |
|
Test build #146180 has finished for PR 34893 at commit
|
|
Kubernetes integration test status failure |
| def default_session(conf: Optional[Dict[str, Any]] = None) -> SparkSession: | ||
| if conf is None: | ||
| conf = dict() | ||
| def default_session() -> SparkSession: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we can remove the conf argument here?
I guess we should show a deprecation warning if it's not None for now and remove it in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't an API that's not documented so it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just leave it to you.
ueshin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| def default_session(conf: Optional[Dict[str, Any]] = None) -> SparkSession: | ||
| if conf is None: | ||
| conf = dict() | ||
| def default_session() -> SparkSession: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just leave it to you.
|
Merged to master. |
What changes were proposed in this pull request?
This PR proposes to use an existing active Spark session instead of
SparkSession.getOrCreatein pandas API on Spark.Why are the changes needed?
Because it shows warnings for configurations not taking effect as below:
Otherwise, it attempts to create a new session, and shows warnings as below:
Does this PR introduce any user-facing change?
Yes, after this PR, it will explicitly uses active Spark session, and does not show such warnings:
How was this patch tested?
Manually tested as below: