0
votes

I have used python sdk to write my beam pipelines. I am using celery as a wrapper over direct runner. I want to use flink runner to parallelise my load.

As per the documentation, you need to give your job as a jar file for flink runner.

Can you point me to any resources where I can use both apache beam python sdk and apache flink? Any samples?

1

1 Answers

1
votes

As for now (Apache Beam 2.2.0) there is no support for Apache Flink Runner for Apache Beam Python SDK. When you try to use FlinkRunner in your Python pipeline you will get ValueError:

ValueError: Unexpected pipeline runner: FlinkRunner. Valid values are DirectRunner, EagerRunner, DataflowRunner, TestDataflowRunner or the fully qualified name of a PipelineRunner subclass.

You can see this in source code, here: https://github.com/apache/beam/blob/d11b9e9560131f55b418a13a7d10401c2135fb33/sdks/python/apache_beam/runners/runner.py#L62