-
Notifications
You must be signed in to change notification settings - Fork 29k
SPARK-1254. Supplemental fix for HTTPS on Maven Central #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
All automated tests passed. |
|
Hey @srowen - I'm guessing that maven hostname delegates to a mirror/CDN network and maybe some of them support HTTPS and others don't. Seems fine to just fallback to HTTP in that case. |
|
Merged |
## What changes were proposed in this pull request? This patch introduces advanced query pushdown to Redshift and is largely based on the work done by the Snowflake people: https://github.com/snowflakedb/spark-snowflake Supported operators: - Filter, Project, Sort, Limit Supported expressions (see PR apache#221 for more info): - most boolean logic operators - comparisons - basic arithmetic operations - numeric and string casts - most string functions - (uncorrelated) scalar subqueries Note: No support for `date` and `timestamp` yet. New feature flag: - `spark.databricks.redshift.pushdown` - enabled by default Future TODOs: - enable support for more complex expressions and operators (e.g. dates and timestamps, Aggr, Joins) (SC-5768) - integrate TPC-H testing suite (SC-5717) ## How was this patch tested? * pre-existing Redshift unit tests and redshift-integration-tests * adapted a large part of pre-existing integration tests to check both the old and the new code paths * three new test suites: {`Filter`,`Advanced`,`Randomized`}`PushdownIntegrationSuite` Author: Adrian Ionescu <[email protected]> Author: Juliusz Sompolski <[email protected]> Author: Adrian Ionescu <[email protected]> Closes apache#209 from adrian-ionescu/redshift-pushdown.
* metrics 3.2.2 * update deps
…re (Permission denied)" error on Spark 2.1.X MEP 4.0 - MapR SASL security enabled (apache#209) (cherry picked from commit 97c377b)
* test beta stub universes * use hdfs in test_hdfs
Clean volume resource after k8s jobs finished to avoid volume quota error
…constraint
### What changes were proposed in this pull request?
This PR add support infer constraints from cast equality constraint. For example:
```scala
scala> spark.sql("create table spark_29231_1(c1 bigint, c2 bigint)")
res0: org.apache.spark.sql.DataFrame = []
scala> spark.sql("create table spark_29231_2(c1 int, c2 bigint)")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("select t1.* from spark_29231_1 t1 join spark_29231_2 t2 on (t1.c1 = t2.c1 and t1.c1 = 1)").explain
== Physical Plan ==
*(2) Project [c1#5L, c2#6L]
+- *(2) BroadcastHashJoin [c1#5L], [cast(c1#7 as bigint)], Inner, BuildRight
:- *(2) Project [c1#5L, c2#6L]
: +- *(2) Filter (isnotnull(c1#5L) AND (c1#5L = 1))
: +- *(2) ColumnarToRow
: +- FileScan parquet default.spark_29231_1[c1#5L,c2#6L] Batched: true, DataFilters: [isnotnull(c1#5L), (c1#5L = 1)], Format: Parquet, Location: InMemoryFileIndex[file:/root/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehouse/spark_29231_1], PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,1)], ReadSchema: struct<c1:bigint,c2:bigint>
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), [id=#209]
+- *(1) Project [c1#7]
+- *(1) Filter isnotnull(c1#7)
+- *(1) ColumnarToRow
+- FileScan parquet default.spark_29231_2[c1#7] Batched: true, DataFilters: [isnotnull(c1#7)], Format: Parquet, Location: InMemoryFileIndex[file:/root/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehouse/spark_29231_2], PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: struct<c1:int>
```
After this PR:
```scala
scala> spark.sql("select t1.* from spark_29231_1 t1 join spark_29231_2 t2 on (t1.c1 = t2.c1 and t1.c1 = 1)").explain
== Physical Plan ==
*(2) Project [c1#0L, c2#1L]
+- *(2) BroadcastHashJoin [c1#0L], [cast(c1#2 as bigint)], Inner, BuildRight
:- *(2) Project [c1#0L, c2#1L]
: +- *(2) Filter (isnotnull(c1#0L) AND (c1#0L = 1))
: +- *(2) ColumnarToRow
: +- FileScan parquet default.spark_29231_1[c1#0L,c2#1L] Batched: true, DataFilters: [isnotnull(c1#0L), (c1#0L = 1)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/spark/spark-warehouse/spark_29231_1], PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,1)], ReadSchema: struct<c1:bigint,c2:bigint>
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), [id=#99]
+- *(1) Project [c1#2]
+- *(1) Filter ((cast(c1#2 as bigint) = 1) AND isnotnull(c1#2))
+- *(1) ColumnarToRow
+- FileScan parquet default.spark_29231_2[c1#2] Batched: true, DataFilters: [(cast(c1#2 as bigint) = 1), isnotnull(c1#2)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/spark/spark-warehouse/spark_29231_2], PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: struct<c1:int>
```
### Why are the changes needed?
Improve query performance.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #27252 from wangyum/SPARK-29231.
Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
…constraint
### What changes were proposed in this pull request?
This PR add support infer constraints from cast equality constraint. For example:
```scala
scala> spark.sql("create table spark_29231_1(c1 bigint, c2 bigint)")
res0: org.apache.spark.sql.DataFrame = []
scala> spark.sql("create table spark_29231_2(c1 int, c2 bigint)")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("select t1.* from spark_29231_1 t1 join spark_29231_2 t2 on (t1.c1 = t2.c1 and t1.c1 = 1)").explain
== Physical Plan ==
*(2) Project [c1#5L, c2#6L]
+- *(2) BroadcastHashJoin [c1#5L], [cast(c1#7 as bigint)], Inner, BuildRight
:- *(2) Project [c1#5L, c2#6L]
: +- *(2) Filter (isnotnull(c1#5L) AND (c1#5L = 1))
: +- *(2) ColumnarToRow
: +- FileScan parquet default.spark_29231_1[c1#5L,c2#6L] Batched: true, DataFilters: [isnotnull(c1#5L), (c1#5L = 1)], Format: Parquet, Location: InMemoryFileIndex[file:/root/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehouse/spark_29231_1], PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,1)], ReadSchema: struct<c1:bigint,c2:bigint>
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), [id=apache#209]
+- *(1) Project [c1#7]
+- *(1) Filter isnotnull(c1#7)
+- *(1) ColumnarToRow
+- FileScan parquet default.spark_29231_2[c1#7] Batched: true, DataFilters: [isnotnull(c1#7)], Format: Parquet, Location: InMemoryFileIndex[file:/root/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehouse/spark_29231_2], PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: struct<c1:int>
```
After this PR:
```scala
scala> spark.sql("select t1.* from spark_29231_1 t1 join spark_29231_2 t2 on (t1.c1 = t2.c1 and t1.c1 = 1)").explain
== Physical Plan ==
*(2) Project [c1#0L, c2#1L]
+- *(2) BroadcastHashJoin [c1#0L], [cast(c1#2 as bigint)], Inner, BuildRight
:- *(2) Project [c1#0L, c2#1L]
: +- *(2) Filter (isnotnull(c1#0L) AND (c1#0L = 1))
: +- *(2) ColumnarToRow
: +- FileScan parquet default.spark_29231_1[c1#0L,c2#1L] Batched: true, DataFilters: [isnotnull(c1#0L), (c1#0L = 1)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/spark/spark-warehouse/spark_29231_1], PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,1)], ReadSchema: struct<c1:bigint,c2:bigint>
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), [id=apache#99]
+- *(1) Project [c1#2]
+- *(1) Filter ((cast(c1#2 as bigint) = 1) AND isnotnull(c1#2))
+- *(1) ColumnarToRow
+- FileScan parquet default.spark_29231_2[c1#2] Batched: true, DataFilters: [(cast(c1#2 as bigint) = 1), isnotnull(c1#2)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/spark/spark-warehouse/spark_29231_2], PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: struct<c1:int>
```
### Why are the changes needed?
Improve query performance.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Unit test.
Closes apache#27252 from wangyum/SPARK-29231.
Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
It seems that HTTPS does not necessarily work on Maven Central, as it does not today at least. Back to HTTP. Both builds works from a clean repo.