Apache Beam 2.1.0 added support for submitting jobs on the Dataflow runner on private subnetworks and without public IPs, which we needed to satisfy our firewall rules. I planned to use a squid proxy to access apt-get, pip, etc to install python dependencies; a proxy instance is already running and we set the proxies inside our setup.py script.
python $DIR/submit.py \
--runner DataflowRunner \
--no_use_public_ips \
--subnetwork regions/us-central1/subnetworks/$PRIVATESUBNET \
--staging_location $BUCKET/staging \
--temp_location $BUCKET/temp \
--project $PROJECT \
--setup_file $DIR/setup.py \
--job_name $JOB_NAME
When I try to run via the python API I error out during worker-startup before I get a chance to enable the proxy. It looks to me like each worker first tries to install the dataflow sdk:
and during that it tries to update requests and fails to connect to pip:
None of my code has been executed at this point, so I can't see a way to avoid this error before setting up the proxy. Is there any way to launch dataflow python workers on a private subnet?

