Skip to content

Commit c82501a

Browse files
committed
Clarify Scaladoc for OutBlock
1 parent e5cdba1 commit c82501a

File tree

1 file changed

+21
-2
lines changed
  • mllib/src/main/scala/org/apache/spark/ml/recommendation

1 file changed

+21
-2
lines changed

mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -944,14 +944,33 @@ object ALS extends DefaultParamsReadable[ALS] with Logging {
944944
private type FactorBlock = Array[Array[Float]]
945945

946946
/**
947-
* Out-link blocks that store information about which columns of the items factor matrix are
948-
* required to calculate which rows of the users factor matrix, and vice versa.
947+
* A mapping of the columns of the items factor matrix that are needed when calculating each row
948+
* of the users factor matrix, and vice versa.
949949
*
950950
* Specifically, when calculating a user factor vector, since only those columns of the items
951951
* factor matrix that correspond to the items that that user has rated are needed, we can avoid
952952
* having to repeatedly copy the entire items factor matrix to each worker later in the algorithm
953953
* by precomputing these dependencies for all users, storing them in an RDD of `OutBlock`s. The
954954
* items' dependencies on the columns of the users factor matrix is computed similarly.
955+
*
956+
* =Example=
957+
*
958+
* Using the example provided in the `InBlock` Scaladoc, `userOutBlocks` would look like the
959+
* following:
960+
*
961+
* {{{ userOutBlocks.collect() == Seq(
962+
* 0 -> Array(Array(0, 1), Array(0, 1)),
963+
* 1 -> Array(Array(0), Array(0))) }}}
964+
*
965+
* The data structure encodes the following information:
966+
*
967+
* * There are ratings with user IDs 0 and 6 (encoded in `Array(0, 1)`, where 0 and 1 are the
968+
* indices of the user IDs 0 and 6 on partition 0) whose item IDs map to partitions 0 and 1
969+
* (represented by the fact that `Array(0, 1)` appears in both the 0th and 1st positions).
970+
*
971+
* * There are ratings with user ID 3 (encoded in `Array(0)`, where 0 is the index of the user
972+
* ID 3 on partition 1) whose item IDs map to partitions 0 and 1 (represented by the fact that
973+
* `Array(0)` appears in both the 0th and 1st positions).
955974
*/
956975
private type OutBlock = Array[Array[Int]]
957976

0 commit comments

Comments
 (0)