1
votes

if i want to run a flink job on yarn ,the command is ./bin/flink run -m yarn-cluster ./examples/batch/WordCount.jar

but the command will run a default cluster which have 2 taskmanagers ; if i am only submmit single job,why the default taskmanagers is setted 2?

and when do I need mutiple taskmanager in single job?

1

1 Answers

0
votes

The basic idea of any distributed data processing framework is to run the same job across multiple compute nodes. In this way, applications that process too much data for one particular node, simply scale out to multiple nodes and could in theory process arbitrary much data. I suggest you to read the basic concepts of Flink.

Btw, there is no particular reason to have a default of 2 though. It could be any number, but it happens to be 2.