0
votes

I have a ConvLSTM neural network coded in Keras. I submitted the same code to two queues on the cluster (one GPU and the other CPU). My code on the CPU is running, but on GPU I got an error, below I copied one line of the error file:

"W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.12MiB. Current allocation summary follows. "

Error File:

Using TensorFlow backend.
2018-04-05 17:39:59.059431: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-04-05 17:40:00.220946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:81:00.0
totalMemory: 15.90GiB freeMemory: 332.94MiB
2018-04-05 17:40:00.221266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:81:00.0, compute capability: 6.0)
/opt/apps/python/2.7.14_openmpi-2.1.2_parallel_studio-2017.4/lib/python2.7/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype uint8 was converted to float64 by MinMaxScaler.
  warnings.warn(msg, DataConversionWarning)
2018-04-05 17:40:50.577736: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.12MiB.  Current allocation summary follows.
2018-04-05 17:40:50.578144: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (256):   Total Chunks: 296, Chunks in use: 294. 74.0KiB allocated for chunks. 73.5KiB in use in bin. 9.3KiB client-requested in use in bin.
2018-04-05 17:40:50.578167: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (512):   Total Chunks: 39, Chunks in use: 39. 22.0KiB allocated for chunks. 22.0KiB in use in bin. 16.1KiB client-requested in use in bin.
2018-04-05 17:40:50.578179: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (1024):  Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2018-04-05 17:40:50.578192: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (2048):  Total Chunks: 14, Chunks in use: 14. 36.8KiB allocated for chunks. 36.8KiB in use in bin. 34.5KiB client-requested in use in bin.
2018-04-05 17:40:50.578203: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (4096):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-05 17:40:50.578216: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (8192):  Total Chunks: 62, Chunks in use: 61. 882.2KiB allocated for chunks. 869.2KiB in use in bin. 857.8KiB client-requested in use in bin.
2018-04-05 17:40:50.578228: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (16384):     Total Chunks: 13, Chunks in use: 12. 223.0KiB allocated for chunks. 198.8KiB in use in bin. 190.1KiB client-requested in use in bin.
2018-04-05 17:40:50.578239: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (32768):     Total Chunks: 46, Chunks in use: 46. 2.53MiB allocated for chunks. 2.53MiB in use in bin. 2.53MiB client-requested in use in bin.
2018-04-05 17:40:50.578251: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (65536):     Total Chunks: 168, Chunks in use: 168. 13.19MiB allocated for chunks. 13.19MiB in use in bin. 13.10MiB client-requested in use in bin.
2018-04-05 17:40:50.578263: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (131072):    Total Chunks: 1, Chunks in use: 1. 135.8KiB allocated for chunks. 135.8KiB in use in bin. 80.0KiB client-requested in use in bin.
2018-04-05 17:40:50.578276: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (262144):    Total Chunks: 243, Chunks in use: 243. 76.74MiB allocated for chunks. 76.74MiB in use in bin. 75.94MiB client-requested in use in bin.
2018-04-05 17:40:50.578287: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (524288):    Total Chunks: 3, Chunks in use: 3. 1.64MiB allocated for chunks. 1.64MiB in use in bin. 960.0KiB client-requested in use in bin.
2018-04-05 17:40:50.578297: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (1048576):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-05 17:40:50.578309: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (2097152):   Total Chunks: 4, Chunks in use: 4. 12.50MiB allocated for chunks. 12.50MiB in use in bin. 12.50MiB client-requested in use in bin.
2018-04-05 17:40:50.578336: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (4194304):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-05 17:40:50.578348: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (8388608):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-05 17:40:50.578358: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (16777216):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-05 17:40:50.578367: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (33554432):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-05 17:40:50.578376: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (67108864):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-05 17:40:50.578386: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-05 17:40:50.578395: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (268435456):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-04-05 17:40:50.578406: I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin for 3.12MiB was 2.00MiB, Chunk State: 
2018-04-05 17:40:50.578417: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c000000 of size 1280
2018-04-05 17:40:50.578426: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c000500 of size 256
2018-04-05 17:40:50.578433: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c000600 of size 256
2018-04-05 17:40:50.578440: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c000700 of size 57600
2018-04-05 17:40:50.578448: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00e800 of size 512
2018-04-05 17:40:50.578456: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00ea00 of size 768
2018-04-05 17:40:50.578464: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00ed00 of size 256
2018-04-05 17:40:50.578471: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00ee00 of size 256
2018-04-05 17:40:50.578478: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00ef00 of size 256
2018-04-05 17:40:50.578485: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00f000 of size 256
2018-04-05 17:40:50.578493: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00f100 of size 256
2018-04-05 17:40:50.578500: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00f200 of size 256
2018-04-05 17:40:50.578507: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00f300 of size 256
2018-04-05 17:40:50.578514: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00f400 of size 256
2018-04-05 17:40:50.578522: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00f500 of size 256
2018-04-05 17:40:50.578529: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c00f600 of size 57600
2018-04-05 17:40:50.578536: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c01d700 of size 512
2018-04-05 17:40:50.578544: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c01d900 of size 3072
2018-04-05 17:40:50.578551: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c01e500 of size 57600
2018-04-05 17:40:50.578559: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02c600 of size 512
2018-04-05 17:40:50.578571: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02c800 of size 768
2018-04-05 17:40:50.578579: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02cb00 of size 256
2018-04-05 17:40:50.578586: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02cc00 of size 256
2018-04-05 17:40:50.578593: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02cd00 of size 256
2018-04-05 17:40:50.578600: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02ce00 of size 256
2018-04-05 17:40:50.578607: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02cf00 of size 256
2018-04-05 17:40:50.578614: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02d000 of size 256
2018-04-05 17:40:50.578622: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02d100 of size 256
2018-04-05 17:40:50.578629: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c02d200 of size 14592
2018-04-05 17:40:50.578637: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c030b00 of size 256
2018-04-05 17:40:50.578644: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c030c00 of size 256
2018-04-05 17:40:50.578652: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c030d00 of size 256
2018-04-05 17:40:50.578659: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c030e00 of size 256
2018-04-05 17:40:50.578666: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c030f00 of size 256
2018-04-05 17:40:50.578673: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c031000 of size 256
2018-04-05 17:40:50.578681: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c031100 of size 256
2018-04-05 17:40:50.578688: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c031200 of size 256
2018-04-05 17:40:50.578695: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c031300 of size 512
2018-04-05 17:40:50.578702: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c031500 of size 14592
2018-04-05 17:40:50.578709: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c034e00 of size 256
2018-04-05 17:40:50.578717: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c034f00 of size 256
2018-04-05 17:40:50.578724: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c035000 of size 256
2018-04-05 17:40:50.578731: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c035100 of size 256
2018-04-05 17:40:50.578738: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c035200 of size 256
2018-04-05 17:40:50.578746: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c035300 of size 256
2018-04-05 17:40:50.578753: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c035400 of size 256
2018-04-05 17:40:50.578760: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c035500 of size 256
2018-04-05 17:40:50.578767: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c035600 of size 512
2018-04-05 17:40:50.578775: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c035800 of size 23296
2018-04-05 17:40:50.578782: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c03b300 of size 57600
2018-04-05 17:40:50.578789: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c049400 of size 512
2018-04-05 17:40:50.578797: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c049600 of size 57600
2018-04-05 17:40:50.578804: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c057700 of size 57600
2018-04-05 17:40:50.578811: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c065800 of size 256
2018-04-05 17:40:50.578823: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c065900 of size 256
2018-04-05 17:40:50.578830: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c065a00 of size 256
2018-04-05 17:40:50.578838: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c065b00 of size 256
2018-04-05 17:40:50.578845: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c065c00 of size 256
2018-04-05 17:40:50.578852: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c065d00 of size 256
2018-04-05 17:40:50.578859: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c065e00 of size 256
2018-04-05 17:40:50.578867: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c065f00 of size 256
2018-04-05 17:40:50.578874: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c066000 of size 512
2018-04-05 17:40:50.578881: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c066200 of size 14592
2018-04-05 17:40:50.578888: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c069b00 of size 256
2018-04-05 17:40:50.578896: I tensorflow/core/common_runtime/bfc_allocator.cc:661] Chunk at 0x2b373c069c00 of size 256
1

1 Answers

2
votes

While tensorflow on CPU need to load the data into memory, tensorflow on GPU need the data in GPU memory. This is most likely the reason for your error. You could try reduce the batch size.