0
votes

I have an RDD[LabeledPoint] and I want to find the min and the max of the labels and also apply some transformations, such as subtracting from all of them the number 5. The problem is I have tried various ways to get to the labels, but nothing works correctly.

How can I access only the labels and only the features of the RDD? Is there a way to get them as a List[Double] and List[Vector] for example?

I cannot go to dataframes.

2

2 Answers

0
votes

You can create DataFrames from an existing RDD with a SparkSession.For DataFrame you can operate it anyway.

0
votes

Ok, so after playing around with the map function, i came up with this solution

val labels = rdd.map(x=> x.label)
val min = labels.min
val max = labels.max

If you want to make changes to the labels, once again you can use the map function

rdd.map(x=> x.label - 5)

This way you can play around with the label part of a RDD[LabeledPoint].

After the comments of Cyril below, I decided to also add the command that lets you keep your RDD and change only the label however you want.

val newRdd = rdd.map(x => x.copy(x.label -5))