I'm using TensorFlow-Slim. My aim is to run a given standard script (located in /models/slim/scripts) in multi-GPU mode. I've tested the finetune_resnet_v1_50_on_flowers.sh script (cloned on 12.04.2017). I've just added --num_clones=2 at the end of the training part (inspired by /slim/deployment/model_deploy_test.py and previous StackOverflow answers):
python train_image_classifier.py \
--train_dir=${TRAIN_DIR} \
--dataset_name=flowers \
--dataset_split_name=train \
--dataset_dir=${DATASET_DIR} \
--model_name=resnet_v1_50 \
--checkpoint_path=${PRETRAINED_CHECKPOINT_DIR}/resnet_v1_50.ckpt \
--checkpoint_exclude_scopes=resnet_v1_50/logits \
--trainable_scopes=resnet_v1_50/logits \
--max_number_of_steps=3000 \
--batch_size=32 \
--learning_rate=0.01 \
--save_interval_secs=60 \
--save_summaries_secs=60 \
--log_every_n_steps=100 \
--optimizer=rmsprop \
--weight_decay=0.00004 \
--num_clones=2
Code from deployment/model_deploy_test.py:
def testMultiGPU(self):
deploy_config = model_deploy.DeploymentConfig(num_clones=2)
I've got one warning ('Ignoring device specification'):
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:85:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-SXM2-16GB, pci bus id: 0000:86:00.0)
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:1 for node 'clone_1/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0
I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /GPU:0 for node 'clone_0/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /CPU:0
The GPUs run normally (memory usage and GPU utilization), but the training is not faster compared with a single GPU training.
This issue might be related to: https://github.com/tensorflow/tensorflow/issues/8061
I would be very pleased to receive your answers, opinions or concrete proposals to this problem.
CUDA version: release 8.0, V8.0.53
TensorFlow installed from binary, tested versions: 1.0.1 and 1.1.0rc
GPU: NVIDIA Tesla P100 (SXM2)