1
votes

Why is my machine learning python/tensorflow script runs faster on colab as compared with a 24-vCPU google compute instance?

Invocation on colab: !/content/myscript.py

Invocation on google compute instance: !/home/prj1/myscript.py

epoch time on colab: 0.8 s

epoch time on google compute instance: 2.0 s

In both cases, I am using tensorflow 1.11 and python 2.7, myscript.py is the only program running, and GPU is not being used. The script displays training progress (text-only, no graphics) on the screen every 10 epochs.

2
Can you share the script? - David
I apologize David, but at this stage, I cannot share the script. - ahk3
Which GCE image had you used with the regular 24CPU machine? - Milad Tabrizi
Debian GNU/Linux 9 (stretch) - ahk3

2 Answers

0
votes

Colaboratory is optimized for Tensorflow, while a GCE instance is a regular machine. "When you create a new notebook on colab.research.google.com, TensorFlow is already pre-installed and optimized for the hardware being used." This is likely why you are seeing a performance difference between the two.

0
votes

I was able to bring the epoch time down to 1.1 s by following the optimization recommended at Tips to Improve Performance for Popular Deep Learning Frameworks on CPUs

Here is the code:

import os
N_CORES                       = int(os.cpu_count()/2)
os.environ["OMP_NUM_THREADS"] = str(N_CORES)
os.environ["KMP_BLOCKTIME"]   = "30"
os.environ["KMP_SETTINGS"]    = "1"
os.environ["KMP_AFFINITY"]    = "granularity=fine,verbose,compact,1,0"
config_sess                   = tf.ConfigProto(intra_op_parallelism_threads=N_CORES, inter_op_parallelism_threads=2, allow_soft_placement=True, device_count = {'CPU': N_CORES})

with tf.Session(config=config_sess) as sess: