1
votes

I want to use the dataflow job to encrypt the coming message from pubsub subscription before writing to a big query. I am using pycryptodome==3.9.8, cryptography==3.1 python library to do that.

In the dataflow job, I am using below two imports

from Crypto import Random from Crypto.Cipher import AES

When I try to deploy dataflow pipeline without --requirements_file parameter. It deploys perfectly, but after publishing a message to a topic it throws an error

ModuleNotFoundError: No module named 'Crypto' [while running 'generatedPtransform-81']

After that, I tried to deploy the pipeline again with --requirements_file requirement.txt flag. The dataflow pipeline deploys okay, but now it is not accepting any messages from subscriptions. There is no error in the dataflow job as it did not fetch the message.

Am I missing something? As there is no log to it, its very difficult to identify.

1
I've experienced the same when using requirements_file. A workaround is to use setup.py issues.apache.org/jira/browse/BEAM-10115 stackoverflow.com/a/62046774/4756279Peter Kim

1 Answers

1
votes

Re-posting comment by @peter-kim as an answer: Use a setup.py file and you should be able to do what you need. See Dataflow fails when I add requirements.txt [Python]