I'm using Flink to process some JSON-format streaming data:
{"uuid":"903493290432934", "bin": "68.3"}
{"uuid":"324938722984237", "bin": "56.8"}
...
My job is quite simple:
get stream from the Data Source ---> deserialize data into String ---> transform String to JSON object myJsonObj
---> double res = myJsonObj.get("bin")
---> do some heavy calculation with res
.
Here is my code:
FlinkPravegaReader<String> source = ... // init source
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// transform String to MyJson
DataStream<MyJson> jsonStream = env.addSource(source).name("Pravega Stream")
.map(new MapFunction<String, MyJson>() {
@Override
public MyJson map(String s) throws Exception {
MyJson myJson = JSON.parseObject(s, MyJson.class);
return myJson;
}
});
// do the heavy process
DataStream<String> heavyResult = jsonStream
.map(new MapFunction<MyJson, String>() {
@Override
public String map(MyJson myJson) throws Exception {
double res = myJson.get("bin");
// do some very heavy calculation
return myJson.get("uuid").asText() + " done.";
}
});
heavyResult.print();
As my understanding, I haven't used any keyBy/window
, so I think I used windowAll
by default. Am I right?
If I'm right, the doc of Flink told me that windowAll
couldn't be run in the parallel way. So does it mean that I have to do the heavy calculation one by one? I'm thinking if it is possible to do the heavy calculation parallelly.
As you see, in my case, it doesn't seem that using keyBy/window
makes any sense. So how to make this case execute parallelly? Is it possible to make two jobs running together with the same Data Source as below?
/----windowAll ---- do the heavy calculation
/
Data Source-
\
\----windowAll ---- do the heavy calculation
Is this design possible? Saying that the Data Source generates three elements: A and B. With this design, I'm expecting that one windowAll processes A while the other windowAll processes B.