0
votes

I am trying to run a simple Beam pipeline from Powershell. The service account I am using has access to all the GCS buckets that it needs. This works totally fine on my personal laptop, but on my work laptop I get the INFO output below and the job never shows up in the Dataflow console, also no logs are generated in GCP or anywhere else I can find.

I'm just wondering what could cause this on one laptop and not the other?

(virtualenv) PS C:\apps\beam> python -m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt --output gs://dw_json/counts --runner DataflowRunner --project 'inspired-studio-111111' --region 'us-west1' --temp_location gs://dw_json_temp/tmp/
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token
INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
INFO:oauth2client.client:Refreshing access_token

EDIT I was able to add some logging to output the traceback. I found that a GCS bucket is not accessible by the app when the pipeline options are being validated

https://www.googleapis.com/storage/v1/b/dataflow-staging-us-central1-9b3b14cdbfe093a43e2e0e83d1f47d1e?alt=json

[WinError 10061] No connection could be made because the target machine actively refused it

The service account that I am using in my local json key has full access to this bucket.

Any ideas what is blocking here?

1
Do you perhaps have multiple sets of credentials that are getting mixed up? - robertwb

1 Answers

0
votes

I'm assuming gsutil ls gs://dw_json/counts works for you? I wonder if it could be a similar issue to https://issues.apache.org/jira/browse/BEAM-2264 There's not much to go on here; perhaps you could add some additional logging to see how far it's making it.