I am running the code for using pipe in RDD spark operations:
following snippet I have tried:
//PIPE - run a external shell script in spark
val x = sc.parallelize(Array("A", "Ba", "C", "AD"))
val y = x.pipe("grep -i A")
println(x.collect())
println(y.collect())
But I am getting :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 61.0 failed 1 times, most recent failure: Lost task 0.0 in stage 61.0 (TID 592, localhost, executor driver): java.lang.IllegalStateException: Subprocess exited with status 1. Command ran: grep -i A for running the above snippet.
Is there a way to run the grep -i command through pipe?
I tried with calling a .sh script and it is working, but I wanted to run it as a shell command.
Reference
Subprocess exited with status 1indicated that the Grep command exited with a non zero status. Exit code 1 for grep simply means no lines were selected / matched - Which is true for all of the elements in the RDD without an A in them - tomgalpin