0
votes

I'm running simple dense layers, but Gpu load and Cpu load is low all the time. enter image description here enter image description here print(device_lib.list_local_devices())

2019-02-19 19:06:23.911633: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

2019-02-19 19:06:24.231261:Itensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.83 pciBusID: 0000:65:00.0 totalMemory: 8.00GiB freeMemory: 6.55GiB 2019-02-19 19:06:24.237952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-02-19 19:06:25.765790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-02-19 19:06:25.769303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-02-19 19:06:25.771334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-02-19 19:06:25.776384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 6288 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:65:00.0, compute capability: 7.5) [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 5007262859900510599 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 6594058650 locality { bus_id: 1 links { } } incarnation: 16804701769178738279 physical_device_desc: "device: 0, name: GeForce RTX 2080, pci bus id: 0000:65:00.0, compute capability: 7.5"

At leaset, it is working on GPU. But I don't know if this is max limit to proceduce this deep learning net in this GPU or not.

EDIT2: dataset

https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant

It's about 10000 datapoint and 4 description variables.

EDIT3: Code, it's really simple.

num_p = 8
model = Sequential()
model.add(Dense(8*num_p, input_dim=input_features, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(16*num_p, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(8*num_p, input_dim=input_features, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1, activation='linear'))
model.compile(loss='mae', optimizer='adam')

es = EarlyStopping(monitor='val_loss', min_delta=0.0005, patience=200, verbose=0, mode='min')
his = model.fit(x=X_train_scaled, y=y_train, batch_size=64, epochs=10000, verbose=0,
validation_split=0.2, callbacks=[es])

EDIT4: input data code

df = pd.read_csv("dataset")
X_train, X_test, y_train, y_test = 
train_test_split(df.iloc[:, :-1].values, df.iloc[:, -1].values)
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
batch_size = 64
dataset = tf.data.Dataset.from_tensor_slices((X_train_scaled, y_train))
print(dataset)
dataset = dataset.cache()
print(dataset)
dataset = dataset.shuffle(len(X_train_scaled))
print(dataset)
dataset = dataset.repeat()
print(dataset)
dataset = dataset.batch(batch_size)
print(dataset)
dataset = dataset.prefetch(batch_size*10)
print(dataset)

<TensorSliceDataset shapes: ((4,), ()), types: (tf.float64, tf.float64)> 
<CacheDataset shapes: ((4,), ()), types: (tf.float64, tf.float64)> 
<ShuffleDataset shapes: ((4,), ()), types: (tf.float64, tf.float64)> 
<RepeatDataset shapes: ((4,), ()), types: (tf.float64, tf.float64)> 
<BatchDataset shapes: ((?, 4), (?,)), types: (tf.float64, tf.float64)> 
<PrefetchDataset shapes: ((?, 4), (?,)), types: (tf.float64, tf.float64)>
2
How your input pipeline looks like?Sharky
Thank you for the comment, I put the model summary.AutomaKen
How you input data to your model? What kind of data do you train on?Sharky
I meant how your code looks like? What function do you use to feed data to your model?Sharky
How you get these x=X_train_scaled, y=y_train ?Sharky

2 Answers

2
votes

You can increase GPU utilization by increasing batch size. However, considering rather small dataset size, performance can still be improved by using Dataset API. It's much more scalable solution, capable of handling large datasets.

dataset = tf.data.Dataset.from_tensor_slices((X_train_scaled, y_train))
dataset = dataset.cache() #caches dataset in memory
dataset = dataset.shuffle(len(X_train_scaled)) #shuffles dataset
dataset = dataset.repeat() #with no parameter, repeats indefinitely
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(batch_size*10) #prefetches data 

Then you just pass dataset object to model.fit with no batch_size, cause it was specified earlier and with steps_per_epoch to let the model know the size of epoch.

his = model.fit(dataset, steps_per_epoch=7500, epochs=1000)

p.s. With csv file of this size it's hard to get high utilization rate. You can easily pass whole dataset as one batch and get about 60%. More info here https://www.tensorflow.org/guide/performance/datasets

2
votes

You are looking at the wrong display to see GPU usage with tensorflow. What you are seeing is the 3D activity of the video card.

If you notice there is a drop down arrow next to 3D, Video Encode etc. Set one of them to Cuda and the other to Copy. This allows you to see the compute usage and copying time.

I actually have a similar type of problem I am working on where I get about 65% usage under Cuda because the dataset is so small. You can increase the batch size to increase GPU usage but you also hurt the net as a result so it really is better to train on data sets with a batch size around 32-128 for most things even if your GPU memory will work on far more.

The answer above for using Datasets should work if you can figure out how to get it working right. That is something I am working on now.