1
votes

I'm using TensorFlow-Slim. My aim is to run a given standard script (located in /models/slim/scripts) in multi-GPU mode. I've tested the finetune_resnet_v1_50_on_flowers.sh script (cloned on 12.04.2017). I've just added --num_clones=2 at the end of the training part (inspired by /slim/deployment/model_deploy_test.py and previous StackOverflow answers):

python train_image_classifier.py \
  --train_dir=${TRAIN_DIR} \
  --dataset_name=flowers \
  --dataset_split_name=train \
  --dataset_dir=${DATASET_DIR} \
  --model_name=resnet_v1_50 \
  --checkpoint_path=${PRETRAINED_CHECKPOINT_DIR}/resnet_v1_50.ckpt \
  --checkpoint_exclude_scopes=resnet_v1_50/logits \
  --trainable_scopes=resnet_v1_50/logits \
  --max_number_of_steps=3000 \
  --batch_size=32 \
  --learning_rate=0.01 \
  --save_interval_secs=60 \
  --save_summaries_secs=60 \
  --log_every_n_steps=100 \
  --optimizer=rmsprop \
  --weight_decay=0.00004 \
  --num_clones=2

Code from deployment/model_deploy_test.py:

def testMultiGPU(self):
    deploy_config = model_deploy.DeploymentConfig(num_clones=2)

I've got one warning ('Ignoring device specification'):

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:85:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-SXM2-16GB, pci bus id: 0000:86:00.0)
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:1 for node 'clone_1/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:0 for node 'clone_0/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0

The GPUs run normally (memory usage and GPU utilization), but the training is not faster compared with a single GPU training.

This issue might be related to: https://github.com/tensorflow/tensorflow/issues/8061

I would be very pleased to receive your answers, opinions or concrete proposals to this problem.

CUDA version: release 8.0, V8.0.53

TensorFlow installed from binary, tested versions: 1.0.1 and 1.1.0rc

GPU: NVIDIA Tesla P100 (SXM2)

2

2 Answers

2
votes

Please follow this document https://github.com/tensorflow/tensorflow/issues/12689 To make sure that the Variables are stored in CPU, we need to use context manager with slim.arg_scope([slim.model_variable, slim.variable], device='/cpu:0'):

It solved my problem.

1
votes

even if this answer might be late, the training is not supposed to be faster ( measured in seconds per steps). There is another model created instead, leading to an effective batch size of 64 with your parameters, so you can half your maximal number of steps.