Utilizing Objects like SVMDataGenerator in Spark Shell/Scala Code

Question

I am a bit new to spark and I was wondering how to use objects like SVMDataGenerator, described in the API Doc here: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.util.SVMDataGenerator$

Specifically, I was having trouble with actually having them work in the Spark shell or in code that I have created in .scala files and then compiled with sbt. In the Spark shell, I tried something like:

import org.apache.spark.mllib.util._
SVMDataGenerator("local", <filepath>)

This, however, throws an error, as it claims that SVMDataGenerator does not accept parameters. I did something similar in a scala file, and again an error was thrown. Looking at the source code for the object, however, I can see that it accepts arguments. I'm just kind of lost on how I would actually use this object (and other similar objects), and any help on this would be appreciated.

Thanks

maasg maasg · Accepted Answer · 2014-06-05T10:10:25

By looking at the source code, SVMDataGenerator is an executable object. That is, it contains a main(String[]) method meant to be executed. eg. from a command line like:

$>scala -cp sparkmllib.jar org.apache.spark.mllib.util.SVMDataGenerator <master> <output_dir> [num_examples] [num_features] [num_partitions]

If you would like to execute it programmatically, you could do something like:

import org.apache.spark.mllib.util._
SVMDataGenerator.main(Array("<master>", "<output_dir>", "[num_examples]", "[num_features]", "[num_partitions]")

(replace parameters as necessary)

Utilizing Objects like SVMDataGenerator in Spark Shell/Scala Code

1 Answers