0
votes

I have been building python pipelines using google cloud dataflow and apache beam for about a year. I am leaving the google cloud environment for a university cluster, which has spark installed. It looks like the spark runner is only for java (https://beam.apache.org/documentation/runners/spark/)? Are there any suggestions on how to run python apache beam pipelines outside of cloud dataflow?

1

1 Answers

1
votes

As of right now, this is not yet possible, but portability across runners and languages is the highest priority and the most active area of development in Beam right now, and I think the portable Flink runner is very close to being able to run simple pipelines in Python, with portable Spark runner development to commence soon (and share lots of code with Flink). Stay tuned and follow the dev@ mailing list!