2
votes

kernel died after running some code
I try to run the code to generate a sample image with the generator I tried to update the conda and Jupiter but none of them worked

I keep watching the memory usage of GPU but it does not use the GPU that much

tensorflow2.0 , ubuntu 18.10, cuda 10.0
python 3.5,

def make_generator_model():
    model = tf.keras.Sequential()
    model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Reshape((7, 7, 256)))
    assert model.output_shape == (None, 7, 7, 256) # Note: None is the batch size

    model.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
    assert model.output_shape == (None, 7, 7, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    assert model.output_shape == (None, 14, 14, 64)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())

    model.add(layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
    assert model.output_shape == (None, 28, 28, 1)

    return model
generator = make_generator_model()

noise = tf.random.normal([1, 100])
generated_image = generator(noise, training=False)

[I 10:20:06.664 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports WARNING:root:kernel 4406ce3b-1b5b-4ef8-aba9-d5fd9ed129e7 restarted 2019-04-18 10:20:21.002451: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-04-18 10:20:21.081020: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1589] Found device 0 with properties: name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582 pciBusID: 0000:42:00.0 totalMemory: 11.91GiB freeMemory: 340.69MiB 2019-04-18 10:20:21.081054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1712] Adding visible gpu devices: 0 2019-04-18 10:20:21.081382: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-04-18 10:20:21.107510: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55de6ead0990 executing computations on platform CUDA. Devices: 2019-04-18 10:20:21.107562: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): TITAN Xp, Compute Capability 6.1 2019-04-18 10:20:21.127890: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3493050000 Hz 2019-04-18 10:20:21.129460: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55de6eed7eb0 executing computations on platform Host. Devices: 2019-04-18 10:20:21.129503: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2019-04-18 10:20:21.129616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1712] Adding visible gpu devices: 0 2019-04-18 10:20:21.129722: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2019-04-18 10:20:21.130785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-18 10:20:21.130807: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] 0 2019-04-18 10:20:21.130819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1139] 0: N 2019-04-18 10:20:21.131090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1260] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 115 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:42:00.0, compute capability: 6.1) 2019-04-18 10:20:24.168083: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2019-04-18 10:20:24.331094: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-04-18 10:20:24.789774: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-04-18 10:20:24.791468: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-04-18 10:20:24.791484: F tensorflow/core/kernels/conv_grad_input_ops.cc:949] Check failed: stream->parent()->GetConvolveBackwardDataAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(stream->parent()), &algorithms) [I 10:20:27.669 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports WARNING:root:kernel 4406ce3b-1b5b-4ef8-aba9-d5fd9ed129e7 restarted

1

1 Answers

0
votes

Based on the output of the error it seems that it is a memory problem.

"totalMemory: 11.91GiB freeMemory: 340.69MiB"

Try to restart your PC and as soon as you turn it back on see how much RAM is available and then execute again your code and see if it has worked out.