Handling large image dataset in tensorflow

Question

I have a dataset of over 1.5 million images and I have to classify them into 62 classes. I have created two numpy array features (path of png images) and labels (int label). Now I want to load these images using opencv, but handing such large loaded input in RAM is inefficient.

So I also tried following using tensorflow input pipeline documentation:

import tensorflow as tf

filename_queue = 
tf.train.string_input_producer(['batch1.csv','batch2.csv'])
reader = tf.TextLineReader(skip_header_lines=1)
key,value = reader.read(filename_queue)

record_defaults = [['1'],['1']]
paths, labels = tf.decode_csv(value, record_defaults=record_defaults)

features_path = tf.stack([paths])
labels = tf.stack([labels])

with tf.Session() as sess:
    coord = tf.train.Coordinator()
    #Start all QueueRunners added into the graph
    threads = tf.train.start_queue_runners(coord=coord)

    for _ in range(1):
        # d_features, d_labels = sess.run([features_path, labels])
        # print len(d_features), len(d_labels)

        min_after_dequeue = 5
        batch_size = 32
        capacity = 30
        #capacity = min_after_dequeue + 3 * batch_size

        example_batch, label_batch = tf.train.shuffle_batch(
            [features_path, labels], batch_size=batch_size, 
            capacity=capacity,
            min_after_dequeue=min_after_dequeue
        )
        print sess.run([example_batch])

But this is getting stuck when I run it ( I tried printing the shape of tensor, which is coming as expected, but its not printing the batch of my features).

It will be really helpful if someone can guide me a better way to create batches and load images which can be later fed into the tensorflow model.

Alexandre Passos Alexandre Passos · Accepted Answer · 2018-07-16T19:35:59

You're starting the queue runners before creating them (they get created by tf.train.shuffle_batch).

That said, the queue-based input pipeline is deprecated and you should switch to tf.data.

Handling large image dataset in tensorflow

1 Answers