5
votes

Before seeing:

RuntimeError: IOError: [Errno 2] No such file or directory:
'/beam-temp-andrew_mini_vocab-..../......andrew_mini_vocab' [while running .....]

in my apache beam python dataflow job I see this error logged:

A setup error was detected in __. Please refer to the worker-startup
log for detailed information. `

I've found the worker startup logs and the Payload error is:

Failed to install packages: failed to install SDK: exit status 2

The error is not specific enough for me to debug. Any insight into what SDK isn't getting loaded? My imports for the job are extremely basic:

from __future__ import absolute_import
from __future__ import division
import argparse
import logging
import re
import apache_beam as beam
from apache_beam.io import WriteToText
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions
from apache_beam.pvalue import AsDict
2
how did you install beam? just via pip? - Pablo
I encountered exit status 2 for subprocess, it was resolved by updating pip to version greater than 7, reason being no-binary option wasnt supported before that version, which is being used in python sdk of apache beam - Anuj
@Andrew Cassidy - did Anuj's answer help? - Pablo

2 Answers

0
votes

Check your version of pip with pip -V, and try to update it.

Please comment on the question if this does not help : )

0
votes

Can you share the setup.py file? I had similar problem, solved it using setup.py file.