Due to a performance measurement I want to execute my Scala program written for Flink stepwise, i.e.
execute first operator; materialize result;
execute second operator; materialize result;
...
and so on. The original code:
var filename = new String("<filename>")
var text = env.readTextFile(filename)
var counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.groupBy(0).sum(1)
counts.writeAsText("file://result.txt", WriteMode.OVERWRITE)
env.execute()
So I want the execution of var counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.groupBy(0).sum(1)
to be stepwise.
Is calling env.execute()
after every operator the right way to do it?
Or is writing to /dev/null
after every operation, i.e. calling counts.writeAsText("file:///home/username/dev/null", WriteMode.OVERWRITE)
and then calling env.execute()
a better alternative? And does Flink actually have something like a NullSink
for that purpose?
edit: I'm using the Flink Scala Shell on a cluster and setting the application with parallelism=1 for the execution of the above code.