I have 2 JavaRDDs. The first one is
JavaRDD<CustomClass> data
and the second one is
JavaRDD<Vector> features
My Custom class has 2 fields, (String) text and (int) label. I have 1000 instances of CustomClass in my JavaRDD data and 1000 instances of Vector in the JavaRDD features.
I have computed these 1000 vectors by using the JavaRDD data and applying a map function on it.
Now, I want to have a new JavaRDD of the form
JavaRDD<LabeledPoint>
Since the constructor of a LabeledPoint requires a label and a vector, I am unable to apply a map function which has both CustomClass and the Vector as an argument to the call function since it accepts only one argument.
Can someone please tell me how to combine these two JavaRDDs and get the new
JavaRDD<LabeledPoint>
?
Here are some snippets from the code I wrote :
Class CustomClass {
String text; int label;
}
JavaRDD<CustomClass> data = getDataFromFile(filename);
final HashingTF hashingTF = new HashingTF();
final IDF idf = new IDF();
final JavaRDD<Vector> td2 = data.map(
new Function<CustomClass, Vector>() {
@Override
public Vector call(CustomClass cd) throws Exception {
Vector v = new DenseVector(hashingTF.transform(Arrays.asList(cd.getText().split(" "))).toArray());
return v;
}
}
);
final JavaRDD<Vector> features = idf.fit(td2).transform(td2);