I try to run dataflow pipeline which use a python files that integrated with pickle file below:
dataflow.py
from stopwords import StopWords
stopwords = StopWords()
...
data = (pipeline | 'read' >> ReadFromText (gs://some/inputData.txt)
| 'stopwords' >> beam.Map(lambda x:{'id':x['id'],'text': stopwords.validate(x['text'])}))
stopwords.py
class StopWords:
def __init__ (self):
module_dir = os.path.dirname(__file__)
self.words = pickle.load(open(os.path.join(module_dir, 'model/sw.p'), "rb"))
How ever, I found an error:
IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/dataflow/model/sw.p'
I try to debug self.words
and it run smoothly. however, it countered a problem when I run it in google cloud dataflow job.
Anyone can help?