Skip to content

Commit ca45861

Browse files
hhbyyhjkbradley
authored andcommitted
[SPARK-11507][MLLIB] add compact in Matrices fromBreeze
jira: https://issues.apache.org/jira/browse/SPARK-11507 "In certain situations when adding two block matrices, I get an error regarding colPtr and the operation fails. External issue URL includes full error and code for reproducing the problem." root cause: colPtr.last does NOT always equal to values.length in breeze SCSMatrix, which fails the require in SparseMatrix. easy step to repro: ``` val m1: BM[Double] = new CSCMatrix[Double] (Array (1.0, 1, 1), 3, 3, Array (0, 1, 2, 3), Array (0, 1, 2) ) val m2: BM[Double] = new CSCMatrix[Double] (Array (1.0, 2, 2, 4), 3, 3, Array (0, 0, 2, 4), Array (1, 2, 1, 2) ) val sum = m1 + m2 Matrices.fromBreeze(sum) ``` Solution: By checking the code in [CSCMatrix](https://github.com/scalanlp/breeze/blob/28000a7b901bc3cfbbbf5c0bce1d0a5dda8281b0/math/src/main/scala/breeze/linalg/CSCMatrix.scala), CSCMatrix in breeze can have extra zeros in the end of data array. Invoking compact will make sure it aligns with the require of SparseMatrix. This should add limited overhead as the actual compact operation is only performed when necessary. Author: Yuhao Yang <[email protected]> Closes #9520 from hhbyyh/matricesFromBreeze.
1 parent f301df3 commit ca45861

File tree

2 files changed

+21
-1
lines changed

2 files changed

+21
-1
lines changed

mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -940,8 +940,16 @@ object Matrices {
940940
case dm: BDM[Double] =>
941941
new DenseMatrix(dm.rows, dm.cols, dm.data, dm.isTranspose)
942942
case sm: BSM[Double] =>
943+
// Spark-11507. work around breeze issue 479.
944+
val mat = if (sm.colPtrs.last != sm.data.length) {
945+
val matCopy = sm.copy
946+
matCopy.compact()
947+
matCopy
948+
} else {
949+
sm
950+
}
943951
// There is no isTranspose flag for sparse matrices in Breeze
944-
new SparseMatrix(sm.rows, sm.cols, sm.colPtrs, sm.rowIndices, sm.data)
952+
new SparseMatrix(mat.rows, mat.cols, mat.colPtrs, mat.rowIndices, mat.data)
945953
case _ =>
946954
throw new UnsupportedOperationException(
947955
s"Do not support conversion from type ${breeze.getClass.getName}.")

mllib/src/test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ package org.apache.spark.mllib.linalg
1919

2020
import java.util.Random
2121

22+
import breeze.linalg.{CSCMatrix, Matrix => BM}
2223
import org.mockito.Mockito.when
2324
import org.scalatest.mock.MockitoSugar._
2425
import scala.collection.mutable.{Map => MutableMap}
@@ -499,6 +500,17 @@ class MatricesSuite extends SparkFunSuite {
499500
assert(sm1.numActives === 3)
500501
}
501502

503+
test("fromBreeze with sparse matrix") {
504+
// colPtr.last does NOT always equal to values.length in breeze SCSMatrix and
505+
// invocation of compact() may be necessary. Refer to SPARK-11507
506+
val bm1: BM[Double] = new CSCMatrix[Double](
507+
Array(1.0, 1, 1), 3, 3, Array(0, 1, 2, 3), Array(0, 1, 2))
508+
val bm2: BM[Double] = new CSCMatrix[Double](
509+
Array(1.0, 2, 2, 4), 3, 3, Array(0, 0, 2, 4), Array(1, 2, 1, 2))
510+
val sum = bm1 + bm2
511+
Matrices.fromBreeze(sum)
512+
}
513+
502514
test("row/col iterator") {
503515
val dm = new DenseMatrix(3, 2, Array(0, 1, 2, 3, 4, 0))
504516
val sm = dm.toSparse

0 commit comments

Comments
 (0)