0
votes

I want to check if keras with tensorflow backend runs fine on gpu. I ran this script and got the following output:

Using TensorFlow backend.
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 31s 0us/step
x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
Using real-time data augmentation.
Epoch 1/100
2018-07-06 15:20:00.130371: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-07-06 15:20:00.209953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-07-06 15:20:00.210289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate(GHz): 1.493
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 113.38MiB
2018-07-06 15:20:00.210305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-06 15:20:00.408052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-06 15:20:00.408100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-07-06 15:20:00.408107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-07-06 15:20:00.408248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 57 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-07-06 15:20:00.408744: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 57.38M (60162048 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
2018-07-06 15:20:00.683832: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-07-06 15:20:00.685728: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-07-06 15:20:00.688354: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-07-06 15:20:00.689038: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-07-06 15:20:00.689718: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-07-06 15:20:00.690388: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-07-06 15:20:00.698165: E tensorflow/stream_executor/cuda/cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-07-06 15:20:00.698238: F tensorflow/core/kernels/conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted (core dumped)

I can read totalMemory: 3.95GiB freeMemory: 113.38MiB and failed to allocate 57.38M (60162048 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY.

Why is there so little free memory ? What can I do to make the script run properly and finally enjoy a gpu training ?

OS: Fedora 28

Python 3.6.6

Keras 2.2.0

Tensorflow 1.8.0

GPU GeForce GTX 1050

1
First thing that I would check is to make sure there are no old processes that are holding on to GPU memory. You can check using nvidia-smi command if it's set up.Jeremy Bare
as @JeremyBare mentioned it may be an issue with another program consumes the memory.Jaganadh Gopinadhan

1 Answers

0
votes

This worked for me.

LIMIT = 3 * 1024
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        tf.config.experimental.set_virtual_device_configuration(
            gpus[0],
            [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=LIMIT)])
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Virtual devices must be set before GPUs have been initialized
        print(e)

https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth