0
votes

I'm going over these example from google-cloud Coursera courses, and although they worked till a few weeks ago, I can't install tf.transform or apache_beam on Datalab anymore.

https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/feateng/tftransform.ipynb

https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/06_structured/4_preproc_tft.ipynb

When installing tensorflow_transform I get the following errors:

%bash
pip install --upgrade --force tensorflow_transform==0.6.0 

twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed. datalab 1.1.3 has requirement six==1.10.0, but you'll have six 1.11.0 which is incompatible. gapic-google-cloud-pubsub-v1 0.15.4 has requirement oauth2client<4.0dev,>=2.0.0, but you'll have oauth2client 4.1.2 which is incompatible. proto-google-cloud-pubsub-v1 0.15.4 has requirement oauth2client<4.0dev,>=2.0.0, but you'll have oauth2client 4.1.2 which is incompatible. apache-airflow 1.9.0 has requirement bleach==2.1.2, but you'll have bleach 1.5.0 which is incompatible. apache-airflow 1.9.0 has requirement funcsigs==1.0.0, but you'll have funcsigs 1.0.2 which is incompatible. google-cloud-monitoring 0.28.0 has requirement google-cloud-core<0.29dev,>=0.28.0, but you'll have google-cloud-core 0.25.0 which is incompatible. proto-google-cloud-datastore-v1 0.90.4 has requirement oauth2client<4.0dev,>=2.0.0, but you'll have oauth2client 4.1.2 which is incompatible. pandas-gbq 0.3.0 has requirement google-cloud-bigquery>=0.28.0, but you'll have google-cloud-bigquery 0.25.0 which is incompatible. googledatastore 7.0.1 has requirement httplib2<0.10,>=0.9.1, but you'll have httplib2 0.11.3 which is incompatible. googledatastore 7.0.1 has requirement oauth2client<4.0.0,>=2.0.1, but you'll have oauth2client 4.1.2 which is incompatible. Cannot uninstall 'dill'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

2

2 Answers

2
votes

The tensorflow version on my Datalab instance was 1.4. I had to add this one line of code to update tensorflow to 1.10.1

%bash
pip install --upgrade --force-reinstall pip==10.0.1
pip install tensorflow==1.10.1
pip install tensorflow_transform

my environment:

apache-airflow==1.9.0
apache-beam==2.6.0
tensorflow==1.10.1
tensorflow-metadata==0.9.0
tensorflow-tensorboard==0.4.0rc3
tensorflow-transform==0.8.0
1
votes

The current version of Datalab uses TensorFlow 1.8, so please change the notebook cell in question to:

%bash
pip uninstall -y google-cloud-dataflow
pip install --upgrade --force tensorflow_transform==0.8.0 apache-beam[gcp]

I've updated and checked in the two notebooks linked above.

Another problem might be that you are using Python 2. Datalab by default now uses Python 3 and your pip install (above) happens in Python 3 even if the kernel is Python 2 because %%bash opens up a new shell in which the conda activate of Python 2 has not happened.

To make sure the pip install happens in Python 2, change your pip install of apache-beam[gcp] as follows:

%%bash
source activate py2env
conda install -y dill pytz  # do this for all the distutils complaints 
pip uninstall -y google-cloud-dataflow
pip install --upgrade --force tensorflow_transform==0.8.0 apache-beam[gcp]