I have a DataProc cluster with one master and 4 workers. I have this spark job:
JavaRDD<Signal> rdd_data = javaSparkContext.parallelize(my_data, 8);
rdd_data.foreachPartition(partitionOfRecords -> {
println("Items in partition-" + partitionOfRecords.count(y=>true));
})
Where my_data is an array with about 1000 elements. The job on cluster starts in the right way and return correct data, but it runs only on master and not on workers. I use dataproc image 1.4 for every machine in the cluster
Anybody can help me to understand why this job runs only on master?