Tensorflow gpu uses gpu-ram, but not compute units? | CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

Question

I am trying to use tensorflow-gpu 2.1.0 installed via pip.

Problem is: The task-manager on windows10 shows almost zero usage of the GPU. A usage of 2 % to 5 %. But the ram is used to almost 100 %. What migh be the reason that the tasker-manager shows, that the gpu (GTX 1660 Ti) is not used?

With nvidia-smi I get a different image:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 445.87       Driver Version: 445.87       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 166... WDDM  | 00000000:10:00.0  On |                  N/A |
| 79%   64C    P2   109W / 130W |   5964MiB /  6144MiB |     89%      Default |

I use CUDA 10.1

The warnings for the Tensorflow are:

2020-04-16 21:07:55.541837: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-16 21:07:58.416796: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-16 21:07:58.450054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:10:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.845GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-16 21:07:58.450406: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-16 21:07:58.455452: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-16 21:07:58.459642: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-16 21:07:58.461515: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-16 21:07:58.466455: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-16 21:07:58.469085: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-16 21:07:58.479479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-16 21:07:58.480206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-16 21:07:58.480629: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-16 21:07:58.482300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:10:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.845GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-16 21:07:58.482677: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-16 21:07:58.482875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-16 21:07:58.483040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-16 21:07:58.483203: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-16 21:07:58.483355: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-16 21:07:58.483529: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-16 21:07:58.483712: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-16 21:07:58.484448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-16 21:07:59.249742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-16 21:07:59.250043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-04-16 21:07:59.250203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-04-16 21:07:59.251187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4625 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:10:00.0, compute capability: 7.5)
Found 5338 images belonging to 4 classes.
Found 3554 images belonging to 4 classes.
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to  
  ['...']
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to  
  ['...']
2020-04-16 21:08:11.464027: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-16 21:08:12.081246: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-16 21:08:13.727563: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-04-16 21:08:15.806688: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.
2020-04-16 21:08:15.806850: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1259] Profiler found 1 GPUs
2020-04-16 21:08:15.808769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_101.dll
2020-04-16 21:08:15.909368: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2020-04-16 21:08:15.910677: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1346] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2020-04-16 21:08:16.092575: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER
2020-04-16 21:08:16.092946: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:88]  GpuTracer has collected 0 callback api events and 0 activity events.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.338369). Check your callbacks.

I want to highlight the error: CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

The current insufficent code is:

import argparse

from datetime import datetime
import itertools
from six.moves import range

import io
import matplotlib.pyplot as plt
import numpy as np
import sklearn.metrics

import tensorflow as tf
from tensorflow.keras import applications
from tensorflow.keras.callbacks import TensorBoard, ReduceLROnPlateau, ModelCheckpoint, EarlyStopping, LambdaCallback
from tensorflow.keras.layers import Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator


def plot_confusion_matrix(cm, class_names):
    """
    Returns a matplotlib figure containing the plotted confusion matrix.

    Args:
    cm (array, shape = [n, n]): a confusion matrix of integer classes
    class_names (array, shape = [n]): String names of the integer classes
    """

    figure = plt.figure(figsize=(8, 8))
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title("Confusion matrix")
    plt.colorbar()
    tick_marks = np.arange(len(class_names))
    plt.xticks(tick_marks, class_names, rotation=45)
    plt.yticks(tick_marks, class_names)

    # Normalize the confusion matrix.
    cm = np.around(cm.astype('float') / cm.sum(axis=1)[:, np.newaxis], decimals=2)

    # Use white text if squares are dark; otherwise black.
    threshold = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        color = "white" if cm[i, j] > threshold else "black"
        plt.text(j, i, cm[i, j], horizontalalignment="center", color=color)

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    return figure


def create_resnet50(img_h: int, img_w: int, num_classes: int):
    # define our MLP network
    base_model = applications.resnet50.ResNet50(weights=None, include_top=False, input_shape=(img_h, img_w, 3))

    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dropout(rate=0.3)(x)
    predictions = Dense(num_classes, activation='softmax')(x)
    mdl = Model(inputs=base_model.input, outputs=predictions)
    return mdl


def plot_to_image(figure):
    """Converts the matplotlib plot specified by 'figure' to a PNG image and
    returns it. The supplied figure is closed and inaccessible after this call."""
    # Save the plot to a PNG in memory.
    buf = io.BytesIO()
    plt.savefig(buf, format='png')
    # Closing the figure prevents it from being displayed directly inside
    # the notebook.
    plt.close(figure)
    buf.seek(0)
    # Convert PNG buffer to TF image
    image = tf.image.decode_png(buf.getvalue(), channels=4)
    # Add the batch dimension
    image = tf.expand_dims(image, 0)
    return image


def log_confusion_matrix(epoch, logs):
    # Use the model to predict the values from the validation dataset.
    # create list of 256 images, labels
    itx = 256 // bch_size
    test_images, test_labels_raw = [], []
    for i in range(itx):

        tmp_img, tmp_lbs = next(val_gen)
        test_images.extend(tmp_img)
        test_labels_raw.extend(tmp_lbs)

    test_pred_raw = model.predict(np.array(test_images))
    test_pred = np.argmax(test_pred_raw, axis=1)
    test_labels = np.argmax(test_labels_raw, axis=1)

    # Calculate the confusion matrix.
    cm = sklearn.metrics.confusion_matrix(test_labels, test_pred)
    # Log the confusion matrix as an image summary.
    figure = plot_confusion_matrix(cm, class_names=[x for x in val_gen.class_indices.values()])
    cm_image = plot_to_image(figure)

    # Log the confusion matrix as an image summary.
    with file_writer_cm.as_default():
        tf.summary.image("Confusion Matrix", cm_image, step=epoch)


def run(train_generator, test_generator, epcs: int, mdl: Model, opt):
    # train the model
    mdl.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy', 'mse', ])

    stopper = EarlyStopping(monitor='val_loss', patience=min(epcs / 16, 10), mode='auto',
                            restore_best_weights=True)

    checker = ModelCheckpoint(monitor='val_loss', filepath='weights.{epoch:03d}.hdf5',
                              save_best_only=True, save_freq='epoch')

    shower = TensorBoard(histogram_freq=1)

    reducer = ReduceLROnPlateau(factor=0.6, patience=10, min_delta=1e-4, cooldown=10)

    cm_callback = LambdaCallback(on_epoch_end=log_confusion_matrix)

    history = model.fit(train_generator, epochs=epcs, verbose=0,
                        validation_data=test_generator,
                        callbacks=[stopper, checker, shower, reducer, cm_callback]
                        )

    return history


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('train_path', type=str, help='Path to the train main folder of files.')
    parser.add_argument('test_path', type=str, help='Path to the test main folder of files.')
    parser.add_argument('new_model', type=bool, help='Create new model, or load from file.')
    parser.add_argument('-m', '--model_path', type=str, help='path to model.')
    args = parser.parse_args()

    train_p = args.train_path
    test_p = args.test_path
    is_new = args.new_model
    model_path = args.model_path

    img_height, img_width = 214, 214

    file_writer_cm = tf.summary.create_file_writer('logs/cm')

    model = create_resnet50(img_height, img_width, num_classes=4) if is_new else load_model(model_path)
    adam = Adam(lr=0.0001)

    train_datagen = ImageDataGenerator(
        rescale=1. / 255,
        horizontal_flip=True,
        vertical_flip=True,
        rotation_range=90,
        width_shift_range=0.1,
        height_shift_range=0.1,
        zoom_range=0.2
    )

    validation_datagen = ImageDataGenerator(
        rescale=1.255
    )
    bch_size = 16

    train_gen = train_datagen.flow_from_directory(directory=train_p, target_size=(img_height, img_width),
                                                  batch_size=bch_size)
    val_gen = validation_datagen.flow_from_directory(directory=test_p, target_size=(img_height, img_width),
                                                     batch_size=bch_size)

    h = run(train_gen, val_gen, 100, model, adam)

    m_name = 'Model_resnet50_epoch{}_score{:3.2f}.hdf5'.format(100, min(h.history['val_loss']))
    model.save(m_name)

I really want to thank you in advance. I really appreciate it!

Matt Ginsberg Matt Ginsberg · Accepted Answer · 2020-06-25T21:48:57

This has been discussed in a variety of places, and in general the problem is a permission on your GPU. In Ubuntu, you need to create a file /etc/modprobe.d/nvidia-kernel-common.conf that contains

options nvidia "NVreg_RestrictProfilingToAdminUsers=0"

and you'll need to reboot before it takes effect. You should ALSO run

sudo update-initramfs -u

to ensure that the above command takes effect. (There are a bunch of places that mention creating the file but not the update-initramfs; it didn't work for me until I executed that command and the rebooted again.)

Tensorflow gpu uses gpu-ram, but not compute units? | CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

1 Answers