Resource exhausted error OOM when allocating tensor with shape[384,192]

Question

I am now dealing with a medical image classification problem. The images are multiple slices of the patient's brain. The data has already been cleaned. I have 150 AD patients, 150 MCI(mild cognition impairment) patients, 150 NC patients(normal). Each patient has 96 dicom files or equally saying, 96 slices. Each slice is 160*160.

I use the Tensorflow cifar10 code as my template and to make the code work I have to change its read_cifar10 part. I change the code as the below link suggests.

Attach a queue to a numpy array in tensorflow for data fetch instead of files?

First， I change the data to binary files with the my own python module load_img_M.py. To decrease data amount, I only select the middle slices, from 30 to 60.

    import numpy as np
    import dicom
    import os                      
    ad_path = '/home/zmz/Pictures/AD'
    mci_path = '/home/zmz/Pictures/MCI'
    nc_path = '/home/zmz/Pictures/NC'
    data_path =['/home/zmz/Pictures/%s' %i for i in ['NC','MCI','AD']]
    KDOWN = 29
    KUP = 61
    SLICESNUM = KUP - KDOWN + 1
    RECORDBYTES = 160*160*SLICESNUM + 1
    #load image from the directory and save to binary file
    def img2binary():           
        train_arr = np.zeros([100,SLICESNUM*160*160+1])
        test_arr = np.zeros([50,SLICESNUM*160*160+1])
        for p in range(len(data_path)):
            Patientlist = os.listdir(data_path[p])        
            for q in range(len(Patientlist)):
                Dicompath = os.path.join(data_path[p],Patientlist[q])
                Dicomlist = os.listdir(Dicompath)
                if q<100:
                    train_arr[q,0] = p
                else:
                    test_arr[q-100,0] = p            
                for k in range(len(Dicomlist)):
                    if k>KDOWN and k<KUP:#select the middle slices which have the most information
                        Picturepath = os.path.join(Dicompath,Dicomlist[k])
                        img = dicom.read_file(Picturepath)
                        #print(type(img.pixel_array))
                        imgpixel = img.pixel_array.reshape(25600)
                        if q <100:
                             # assign the label of the picture
                            train_arr[q,(1+(k-KDOWN-1)*25600):(1+(k-KDOWN)*25600)] = imgpixel #assign the pixel
                        else:                        
                            test_arr[q-100,(1+(k-KDOWN-1)*25600):(1+(k-KDOWN)*25600)] = imgpixel                                                
            train_arr.tofile("/home/zmz/Pictures/tmp/images/train%s.bin"%p)
            test_arr.tofile("/home/zmz/Pictures/tmp/images/test%s.bin"%p)

And the binary files look like this:

How binary files look like

Next, I change the cifar10_input module:

"""Routine for decoding the Alzeheimer dicom format"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import load_img_M
import os
import numpy as np
import dicom

from six.moves import xrange  # pylint: disable=redefined-builtin
import tensorflow as tf

# Process images of this size. Note that this differs from the original CIFAR
# image size of 32 x 32. If one alters this number, then the entire model
# architecture will change and any model would need to be retrained.
IMAGE_SIZE = 160

# Global constants describing the ADNI data set.
IMAGE_HEIGHT = 160
IMAGE_WIDTH = 160
IMAGE_CHANNEL = 1
SLICES_NUM = load_img_M.SLICESNUM
NUM_CLASSES = 3
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 300
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 150
#define a dicom reader to read a record 


def read_ADNI(filename_queue):
  """Reads and parses examples from ADNI data files.

  Recommendation: if you want N-way read parallelism, call this function
  N times.  This will give you N independent Readers reading different
  files & positions within those files, which will give better mixing of
  examples.

  Args:
    filename_queue: A queue of strings with the filenames to read from.

  Returns:
    An object representing a single example, with the following fields:
      height: number of rows in the result (160)
      width: number of columns in the result (160)
      channels: number of color channels in the result (1)
      key: a scalar string Tensor describing the filename & record number
        for this example.
      label: an int32 Tensor with the label in the range 0,1,2.
      uint8image: a [slice, height, width, channels] uint8 Tensor with the image data
  """

  class ADNIRecord(object):
    pass#do nothing the class is vacant
  result = ADNIRecord()


  label_bytes = 1  
  result.height = IMAGE_HEIGHT
  result.width = IMAGE_WIDTH
  result.depth = IMAGE_CHANNEL
  result.slice = SLICES_NUM
  image_bytes = result.height * result.width * result.depth * result.slice
  record_bytes = label_bytes + image_bytes

  assert record_bytes == load_img_M.RECORDBYTES

  reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
  result.key, value = reader.read(filename_queue)

  # Convert from a string to a vector of uint8 that is record_bytes long.
  record_bytes = tf.decode_raw(value, tf.uint8)

  # The first bytes represent the label, which we convert from uint8->int32.
  result.label = tf.cast(
      tf.strided_slice(record_bytes, [0], [label_bytes]), tf.int32)

  # The remaining bytes after the label represent the image, which we reshape
  # from [depth * height * width] to [depth, height, width].
  depth_major = tf.reshape(
      tf.strided_slice(record_bytes, [label_bytes],
                       [label_bytes + image_bytes]),
      [result.slice,result.height, result.width, result.depth])
  # Convert from [depth, height, width] to [height, width, depth].
  #result.uint8image = tf.transpose(depth_major, [1, 2, 0])
  result.uint8image = depth_major
  return result

Finally, I change the distorted input. I delete those blocks for image preprocessing like cropping and flipping:

def distorted_inputs(data_dir, batch_size):
  """Construct distorted input for ADNI training using the Reader ops.

  Args:
    data_dir: Path to the ADNI data directory.
    batch_size: Number of images per batch.

  Returns:
    images: Images. 5D tensor of [batch_size, slices , IMAGE_SIZE , IMAGE_SIZE, 1] size.
    labels: Labels. 1D tensor of [batch_size] size.
  """
  #filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
   #            for i in xrange(1, 6)]#data_batch_1,2,3.bin
  filenames = [os.path.join(data_dir,'tmp/images/train%s.bin' % i) for i in [0,1,2]]
  for f in filenames:
    if not tf.gfile.Exists(f):
      raise ValueError('Failed to find file: ' + f)

  # Create a queue that produces the filenames to read.
  filename_queue = tf.train.string_input_producer(filenames)
  '''
  call the first function defined at the very beggining
  '''
  # Read examples from files in the filename queue.
  read_input = read_ADNI(filename_queue)
  reshaped_image = tf.cast(read_input.uint8image, tf.float32)
# Set the shapes of tensors.
  reshaped_image.set_shape([SLICES_NUM,IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNEL])
  read_input.label.set_shape([1])  
  # Ensure that the random shuffling has good mixing properties.
  min_fraction_of_examples_in_queue = 0.4
  min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
                           min_fraction_of_examples_in_queue)
  print ('Filling queue with %d ADNI images before starting to train. '
         'This will take a few minutes.' % min_queue_examples)
return _generate_image_and_label_batch(reshaped_image, read_input.label,
                                         min_queue_examples, batch_size,
                                         shuffle=True)

This problem is a 3d conv network problem. I use tf.conv3d and make changes so that the code could work:

def inference(images):#core code
  """Build the ADNI model.

  Args:
    images: Images returned from distorted_inputs() or inputs().

  Returns:
    Logits.
  """
  # We instantiate all variables using tf.get_variable() instead of
  # tf.Variable() in order to share variables across multiple GPU training runs.
  # If we only ran this model on a single GPU, we could simplify this function
  # by replacing all instances of tf.get_variable() with tf.Variable().
  #
  # conv1
  with tf.variable_scope('conv1') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[3, 3, 3, 1, 64],
                                         stddev=5e-2,
                                         wd=0.0)
    conv = tf.nn.conv3d(images, kernel, [1, 1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv1 = tf.nn.relu(pre_activation, name=scope.name)
    _activation_summary(conv1)

  # pool1
  pool1 = tf.nn.max_pool3d(conv1, ksize=[1, 3, 3, 3, 1], strides=[1, 1, 2, 2, 1],
                         padding='SAME', name='pool1')
  # norm1
  #norm1 = tf.nn.lrn3d(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,name='norm1')
  norm1 = pool1

  # conv2
  with tf.variable_scope('conv2') as scope:
    kernel = _variable_with_weight_decay('weights',
                                         shape=[3, 3, 3, 64, 64],
                                         stddev=5e-2,
                                         wd=0.0)
    conv = tf.nn.conv3d(norm1, kernel, [1, 1, 1, 1, 1], padding='SAME')
    biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
    pre_activation = tf.nn.bias_add(conv, biases)
    conv2 = tf.nn.relu(pre_activation, name=scope.name)
    _activation_summary(conv2)

  # norm2
  #norm2 = tf.nn.lrn3d(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,name='norm2')
  norm2 = conv2
  # pool2
  pool2 = tf.nn.max_pool3d(norm2, ksize=[1, 3, 3 ,3, 1],
                         strides=[1, 1, 2, 2, 1], padding='SAME', name='pool2')

  # local3
  with tf.variable_scope('local3') as scope:
    # Move everything into depth so we can perform a single matrix multiply.
    reshape = tf.reshape(pool2, [FLAGS.batch_size, -1])
    dim = reshape.get_shape()[1].value
    weights = _variable_with_weight_decay('weights', shape=[dim, 384],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
    local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
    _activation_summary(local3)

  # local4
  with tf.variable_scope('local4') as scope:
    weights = _variable_with_weight_decay('weights', shape=[384, 192],
                                          stddev=0.04, wd=0.004)
    biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
    local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
    _activation_summary(local4)

  # linear layer(WX + b),
  # We don't apply softmax here because
  # tf.nn.sparse_softmax_cross_entropy_with_logits accepts the unscaled logits
  # and performs the softmax internally for efficiency.
  with tf.variable_scope('softmax_linear') as scope:
    weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
                                          stddev=1/192.0, wd=0.0)
    biases = _variable_on_cpu('biases', [NUM_CLASSES],
                              tf.constant_initializer(0.0))
    softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
    _activation_summary(softmax_linear)

  return softmax_linear

I was afraid of the computer could not handle this large amount of data.So I chose the batch size to be 1 at first, hoping the code could run at least.Still I encounter the problem in the title. Actually I have very good Linux workstation.12G GPU titan xp, 64G memory. But I share it with my classmates. So the resources that are assigned to my account may be low. My Linux GPU parameters

And it is even better if you can make some calculation to demonstrate why resource is exhausted.

Lorenzo D'Arsiè Lorenzo D'Arsiè · Accepted Answer · 2017-10-10T12:07:23

Even for batch size 1 the network is too big. The problem can be found in the number of weights needed, not in the batch. Try removing a few layers of the convonet or reduce the pixels of the 3d images and see where it's your limit.

Resource exhausted error OOM when allocating tensor with shape[384,192]

1 Answers