4
votes

I'm implementing some machine learning algorithm in Apache Spark MLlib and I would like to multiply vector with scalar:

enter image description here

Where u_i_j_m is a Double and x_i is a vector

I've tried the following:

import breeze.linalg.{ DenseVector => BDV, Vector => BV}
import org.apache.spark.mllib.linalg.{DenseVector, Vectors, Vector}
...

private def runAlgorithm(data: RDD[VectorWithNorm]): = {
    ...
    data.mapPartitions { data_ponts =>
        c = Array.fill(clustersNum)(BDV.zeros[Double](dim).asInstanceOf[BV[Double]])
        ...
        data_ponts.foreach { data_point =>
            ...
            u_i_j_m : Double = ....
            val temp= data_point.vector * u_i_j_m)
            // c(j) = temp
        }
    }
}

Where VectorWithNorm is defined as following:

class VectorWithNorm(val vector: Vector, val norm: Double) extends Serializable {

    def this(vector: Vector) = this(vector, Vectors.norm(vector, 2.0))
    def this(array: Array[Double]) = this(Vectors.dense(array))
    def toDense: VectorWithNorm = new  VectorWithNorm(Vectors.dense(vector.toArray), norm)
}

But when I build the project I get the following error:

Error: value * is not a member of org.apache.spark.mllib.linalg.Vector val temp = (data_point.vector * u_i_j_m)

How can I do this multiplication correctly?

2
u_i_j_m : Double isn't there a val missing? or did you just edited it out by accident?kosii
there is a complicated computation for u_i_j_m so I left it out.Alex L

2 Answers

6
votes

Unfortunately the Spark-Scala contributors decided that they will not pick a library for underlying computations i.e. linear algebra, in Scala. Under the hood they use breeze, but scalar * and + on Spark Vector's are private, as well as other useful methods. This is quite different than python where you can use excellent numpy linear algebra library. The argument was that developers are stretched thin, that breeze was suspicious because development stalled (if I remember correctly), there was an alternative (apache.commons.math), so they decided to let the users pick which linalg library you want to use in Scala. But, prompted by some members of the community, there is now a spark-package which provides linear algebra on org.apache.spark.mllib.linalg.Vector - see here.

3
votes

In your code you are using sparks's Vector trait instead of breeze's DenseVector, that's why there is no * operator defined on your data_point.vector member.