0
votes

I have a fairly simple Apache Beam pipeline in Python I have set up in a Jupyter notebook and would like to deploy to a Dataflow runner. I am fairy new to all 3 of these! I am using the Python 3 and Apache Beam 2.27.0 kernel.

my pipeline options looks something like this:

options.view_as(GoogleCloudOptions).project = 'inspired-studio-11111'
options.view_as(GoogleCloudOptions).job_name = 'Dataflow Test Job2' + jobid
options.view_as(GoogleCloudOptions).region = 'us-central1'
options.view_as(GoogleCloudOptions).staging_location = 'gs://bucket/staging'
options.view_as(GoogleCloudOptions).temp_location = 'gs://bucket/temp'
options.view_as(SetupOptions).save_main_session = True

The pipeline runs fine in the notebook and interacts with GCP storage. When I set it up to run and run it on a GCP dataflow runner, I consistently get the following exception:

Error message from worker: Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 771, in run self._load_main_session(self.local_staging_directory) File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 512, in _load_main_session pickler.load_session(session_file) File "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", line 318, in load_session return dill.load_session(file_path) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 368, in load_session module = unpickler.load() File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 472, in load obj = StockUnpickler.load(self) File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 462, in find_class return StockUnpickler.find_class(self, module, name) ModuleNotFoundError: No module named 'IPython'

Installing and importing ipython in my notebook did not help. Does this need to be configured somewhere on the GCP VM?

1

1 Answers

1
votes

That error is usually caused by using the save_main_session=True option. See Handle nameerrors when launching Dataflow jobs with Apache Beam notebooks for a discussion on other ways of making sure the workers have the right code available at runtime.