Is it Ok that creating TensorFlow device multiple times

Question

I've run a image processing script using tensorflow API. It turns out that the processing time decreased quickly when I set the for-loop outside the session running procedure. Could anyone tell me why? Is there any side-effects?

The original code:

with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(len(file_list)):
    start = time.time()
    image_crop, bboxs_crop = sess.run(crop_image(file_list[i], bboxs_list[i], sess))
    print( 'Done image %d th in %d ms \n'% (i, ((time.time() - start)*1000)))
    # image_crop, bboxs_crop, image_debug = sess.run(crop_image(file_list[i], bboxs_list[i], sess))
    labels, bboxs = filter_bbox(labels_list[i], bboxs_crop)
    # Image._show(Image.fromarray(np.asarray(image_crop)))
    # Image._show(Image.fromarray(np.asarray(image_debug)))
    save_image(image_crop, ntpath.basename(file_list[i]))
    #save_desc_file(file_list[i], labels_list[i], bboxs_crop)
    save_desc_file(file_list[i], labels, bboxs)
coord.request_stop()
coord.join(threads)

The code modified:

for i in range(len(file_list)):
with tf.Graph().as_default(), tf.Session() as sess:
    start = time.time()
    image_crop, bboxs_crop = sess.run(crop_image(file_list[i], bboxs_list[i], sess))
    print( 'Done image %d th in %d ms \n'% (i, ((time.time() - start)*1000)))
    labels, bboxs = filter_bbox(labels_list[i], bboxs_crop)
    save_image(image_crop, ntpath.basename(file_list[i]))
    save_desc_file(file_list[i], labels, bboxs)

The time cost in the original code would keep increasing from 200ms to even 20000ms. While after modified, the the logs messages indicate that there are more than one graph and tensorflow devices were created during running, why is that?

python random_crop_images_hongyuan.py I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GT 730M major: 3 minor: 5 memoryClockRate (GHz) 0.758 pciBusID 0000:01:00.0 Total memory: 982.88MiB Free memory: 592.44MiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 730M, pci bus id: 0000:01:00.0) Done image 3000 th in 317 ms

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 730M, pci bus id: 0000:01:00.0) Done image 3001 th in 325 ms

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 730M, pci bus id: 0000:01:00.0) Done image 3002 th in 312 ms

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 730M, pci bus id: 0000:01:00.0) Done image 3003 th in 147 ms

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 730M, pci bus id: 0000:01:00.0) Done image 3004 th in 447 ms

kafman kafman · Accepted Answer · 2017-02-20T10:30:35

My guess is that this happens because creating the session is an expensive operation. May be it could also happen that the session is not properly cleaned when the with-statement is left, so each new allocation on the device will have less resources available. In short, I would not recommend doing it this way, rather initialize just one session and try to reuse it.

EDIT: In answer to your comment: The session is closed automatically as soon as the with-block is exited. I've read in this github issue that the memory on the GPU is only really released when the whole program exits. But I guess that when you allocate a new session after you closed the last one, Tensorflow will internally just re-use the previously allocated resources. So, in retrospective my answer is probably not very insightful. Sorry if I caused confusion.

Is it Ok that creating TensorFlow device multiple times

2 Answers