2
votes

How to store the result generated from an action like: count in an output directory, in apache Spark Scala?

    val countval= data.map((_,"")).reduceByKey((_+_)).count

The below command does not work as count is not stored as RDD:

    countval.saveAsTextFile("OUTPUT LOCATION")

Is there any way to store countval into local/hdfs location?

2
Maybe he wants to use a Scala library to achieve this?Alberto Bonsanto

2 Answers

1
votes

After you call count it is no longer RDD.

Count is just Long and it does not have saveAsTextFile method.

If you want to store your countval you have to do it like with any other long, string, int...

1
votes

what @szefuf said is correct, after count you have a Long which you can save any way you want. If you want to save it as an RDD with .saveAsTextFile() you have to convert it to an RDD:

 sc.parallelize(Seq(countval)).saveAsTextFile("/file/location")

The parallelize method in SparkContext turns a collection of values into an RDD, so you need to turn the single value to a single-element sequence first. Then you can save it.