Hi was searching the web and amazon documentation for a general know how on to running a spark job on an existing emr yarn cluster on aws.
I'm stuck in the following. I have already setup a local[*] spark cluster to test; now I want to test it on aws emr.
So, far I have created a emr cluster on aws and cannot find documentation on running the following code. This works locally if
"spark.master.url" is set as local[*]
Class code:
public class SparkLocalImpl implements DataMapReduce{
private static SparkConf conf;
private JavaSparkContext sparkContext;
private void createContext(){
conf = new SparkConf().setMaster(env.getProperty("spark.master.url"));//rest is default
sparkContext = new JavaSparkContext(conf);
}
public List<String> getMapReducedData(List<String> str){
createContext();
JavaRDD<String> rdd = sparkContext.parallelize(str);
return rdd.map(eachStr->customMapFunction(eachStr))
.collect()
.stream()
.flatMap(x -> x.stream())
.collect(Collectors.toList());
}
public List<String> customMapFunction(String str){
List<String> strMappedList= new ArrayList();
//do something
return strMappedList;
}
}
Can someone tell me what I am doing wrong?