I have a dataset of over 1.5 million images and I have to classify them into 62 classes. I have created two numpy array features (path of png images) and labels (int label). Now I want to load these images using opencv, but handing such large loaded input in RAM is inefficient.
So I also tried following using tensorflow input pipeline documentation:
import tensorflow as tf
filename_queue =
tf.train.string_input_producer(['batch1.csv','batch2.csv'])
reader = tf.TextLineReader(skip_header_lines=1)
key,value = reader.read(filename_queue)
record_defaults = [['1'],['1']]
paths, labels = tf.decode_csv(value, record_defaults=record_defaults)
features_path = tf.stack([paths])
labels = tf.stack([labels])
with tf.Session() as sess:
coord = tf.train.Coordinator()
#Start all QueueRunners added into the graph
threads = tf.train.start_queue_runners(coord=coord)
for _ in range(1):
# d_features, d_labels = sess.run([features_path, labels])
# print len(d_features), len(d_labels)
min_after_dequeue = 5
batch_size = 32
capacity = 30
#capacity = min_after_dequeue + 3 * batch_size
example_batch, label_batch = tf.train.shuffle_batch(
[features_path, labels], batch_size=batch_size,
capacity=capacity,
min_after_dequeue=min_after_dequeue
)
print sess.run([example_batch])
But this is getting stuck when I run it ( I tried printing the shape of tensor, which is coming as expected, but its not printing the batch of my features).
It will be really helpful if someone can guide me a better way to create batches and load images which can be later fed into the tensorflow model.