Skip to content

Commit 19d9d4c

Browse files
bllchmbrsFelix Cheung
authored andcommitted
[SPARK-19126][DOCS] Update Join Documentation Across Languages
## What changes were proposed in this pull request? - [X] Make sure all join types are clearly mentioned - [X] Make join labeling/style consistent - [X] Make join label ordering docs the same - [X] Improve join documentation according to above for Scala - [X] Improve join documentation according to above for Python - [X] Improve join documentation according to above for R ## How was this patch tested? No tests b/c docs. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: anabranch <[email protected]> Closes #16504 from anabranch/SPARK-19126.
1 parent 1f6ded6 commit 19d9d4c

File tree

3 files changed

+26
-14
lines changed

3 files changed

+26
-14
lines changed

R/pkg/R/DataFrame.R

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2313,9 +2313,9 @@ setMethod("dropDuplicates",
23132313
#' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a
23142314
#' Column expression. If joinExpr is omitted, the default, inner join is attempted and an error is
23152315
#' thrown if it would be a Cartesian Product. For Cartesian join, use crossJoin instead.
2316-
#' @param joinType The type of join to perform. The following join types are available:
2317-
#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left_outer', 'left',
2318-
#' 'right_outer', 'rightouter', 'right', and 'leftsemi'. The default joinType is "inner".
2316+
#' @param joinType The type of join to perform, default 'inner'.
2317+
#' Must be one of: 'inner', 'cross', 'outer', 'full', 'full_outer',
2318+
#' 'left', 'left_outer', 'right', 'right_outer', 'left_semi', or 'left_anti'.
23192319
#' @return A SparkDataFrame containing the result of the join operation.
23202320
#' @family SparkDataFrame functions
23212321
#' @aliases join,SparkDataFrame,SparkDataFrame-method
@@ -2344,15 +2344,18 @@ setMethod("join",
23442344
if (is.null(joinType)) {
23452345
sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc)
23462346
} else {
2347-
if (joinType %in% c("inner", "outer", "full", "fullouter",
2348-
"leftouter", "left_outer", "left",
2349-
"rightouter", "right_outer", "right", "leftsemi")) {
2347+
if (joinType %in% c("inner", "cross",
2348+
"outer", "full", "fullouter", "full_outer",
2349+
"left", "leftouter", "left_outer",
2350+
"right", "rightouter", "right_outer",
2351+
"left_semi", "leftsemi", "left_anti", "leftanti")) {
23502352
joinType <- gsub("_", "", joinType)
23512353
sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType)
23522354
} else {
23532355
stop("joinType must be one of the following types: ",
2354-
"'inner', 'outer', 'full', 'fullouter', 'leftouter', 'left_outer', 'left',
2355-
'rightouter', 'right_outer', 'right', 'leftsemi'")
2356+
"'inner', 'cross', 'outer', 'full', 'full_outer',",
2357+
"'left', 'left_outer', 'right', 'right_outer',",
2358+
"'left_semi', or 'left_anti'.")
23562359
}
23572360
}
23582361
}

python/pyspark/sql/dataframe.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -730,8 +730,9 @@ def join(self, other, on=None, how=None):
730730
a join expression (Column), or a list of Columns.
731731
If `on` is a string or a list of strings indicating the name of the join column(s),
732732
the column(s) must exist on both sides, and this performs an equi-join.
733-
:param how: str, default 'inner'.
734-
One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
733+
:param how: str, default ``inner``. Must be one of: ``inner``, ``cross``, ``outer``,
734+
``full``, ``full_outer``, ``left``, ``left_outer``, ``right``, ``right_outer``,
735+
``left_semi``, and ``left_anti``.
735736
736737
The following performs a full outer join between ``df1`` and ``df2``.
737738

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -750,14 +750,18 @@ class Dataset[T] private[sql](
750750
}
751751

752752
/**
753-
* Equi-join with another `DataFrame` using the given columns.
753+
* Equi-join with another `DataFrame` using the given columns. A cross join with a predicate
754+
* is specified as an inner join. If you would explicitly like to perform a cross join use the
755+
* `crossJoin` method.
754756
*
755757
* Different from other join functions, the join columns will only appear once in the output,
756758
* i.e. similar to SQL's `JOIN USING` syntax.
757759
*
758760
* @param right Right side of the join operation.
759761
* @param usingColumns Names of the columns to join on. This columns must exist on both sides.
760-
* @param joinType One of: `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
762+
* @param joinType Type of join to perform. Default `inner`. Must be one of:
763+
* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`,
764+
* `right`, `right_outer`, `left_semi`, `left_anti`.
761765
*
762766
* @note If you perform a self-join using this function without aliasing the input
763767
* `DataFrame`s, you will NOT be able to reference any columns after the join, since
@@ -812,7 +816,9 @@ class Dataset[T] private[sql](
812816
*
813817
* @param right Right side of the join.
814818
* @param joinExprs Join expression.
815-
* @param joinType One of: `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
819+
* @param joinType Type of join to perform. Default `inner`. Must be one of:
820+
* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`,
821+
* `right`, `right_outer`, `left_semi`, `left_anti`.
816822
*
817823
* @group untypedrel
818824
* @since 2.0.0
@@ -889,7 +895,9 @@ class Dataset[T] private[sql](
889895
*
890896
* @param other Right side of the join.
891897
* @param condition Join expression.
892-
* @param joinType One of: `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
898+
* @param joinType Type of join to perform. Default `inner`. Must be one of:
899+
* `inner`, `cross`, `outer`, `full`, `full_outer`, `left`, `left_outer`,
900+
* `right`, `right_outer`, `left_semi`, `left_anti`.
893901
*
894902
* @group typedrel
895903
* @since 1.6.0

0 commit comments

Comments
 (0)