I am hoping for guidance on how to set --environment_config
when running the Beam wordcount.py demo.
It runs fine with DirectRunner. Flink's wordcount also runs fine (ie running Flink via flink run
).
I would like to run Beam using the Flink runner using a "seperate Flink cluster" as described in the beam documentation. I can't use Docker, so I plan to use --environment_type=PROCESS
.
I am using the following inside the python code to set environment_config:
environment_config = dict()
environment_config['os'] = platform.system().lower()
environment_config['arch'] = platform.machine()
environment_config['command'] = 'ls'
ec = "--environment_config={}".format(json.dumps(environment_config))
Obviously the command is incorrect. When I run this, Flink does receive and successfully process the DataSource
sub-tasks. It eventually time-outs on the CHAIN MapPartition
s.
Could someone provide guidance (or links) as to how to set environment_config? I am running Beam within a Singularity container.