0
votes

I am learning apache spark and trying to execute a small program on scala terminal.

I have started the dfs, yarn and history server using the following commands:

start-dfs.sh
start-yarn.sh
mr-jobhistory-deamon.sh start historyserver

and then in the scala terminal, i have written the following commands:

 var file = sc.textFile("/Users/****/Documents/backups/h/*****/input/ncdc/micro-tab/sample.txt");
 val records = lines.map(_.split("\t"));
 val filters = records.filter(rec => (rec(1) != "9999" && rec(2).matches("[01459]")));
 val tuples = filters.map(rec => (rec(0).toInt, rec(1).toInt)); 
 val maxTemps = tuples.reduceByKey((a,b) => Math.max(a,b));

all commands are executed successfully, except the last one, which throws the following error:

error: value reduceByKey is not a member of org.apache.spark.rdd.RDD[(Int, Int)]

i found some solutions like:

This comes from using a pair rdd function generically. The reduceByKey method is actually a method of the PairRDDFunctions class, which has an implicit conversion from RDD.So it requires several implicit typeclasses. Normally when working with simple concrete types, those are already in scope. But you should be able to amend your method to also require those same implicit.

But i am not sure how to achieve this.

Any help, how to resolve this issue?

1
I'm not able to reproduce your error. Would you care adding a MVCE along with what version of spark you are using ? - eliasah

1 Answers

2
votes

It seems you are missing an import. Try writing this in the console:

import org.apache.spark.SparkContext._

And then running the above commands. This import brings an implicit conversion which lets you use the reduceByKey method.