Tensorflow - Using batching to form a validation set

Question

I'm attempting to use tensorflow's batching system, as detailed here https://www.tensorflow.org/versions/master/how_tos/reading_data/index.html to make predictions using a model that I trained previously. At the moment I have set the batch size that I use in tf.train.batch to be equal to the size of the data set that I want to make predictions over.

However, I want to create a validation set to test my predictions and avoid overfitting.

Is there a way to separate a validation set from the training data using the batching system or is the only way to use placeholders?

Below is a sample of my code responsible for training. It:

Reads data from a CSV file, converts data to tensors

Passes tensors to tf.train.shuffle_batch to train

def input_pipeline(filename_list, batch_size, capacity): filename_queue = tf.train.string_input_producer(filename_list,num_epochs=None) reader = tf.TextLineReader() key, value = reader.read(filename_queue)

# Defaults force key value and label to int, all others to float.
record_defaults = [[1]]+[[46]]+[[1.0] for i in range(436)]
# Reads in a single row from the CSV and outputs a list of scalars.
csv_list = tf.decode_csv(value, record_defaults=record_defaults)
# Packs the different columns into separate feature tensors.
location = tf.pack(csv_list[2:4])
bbox = tf.pack(csv_list[5:8])
pix_feats = tf.pack(csv_list[9:])
onehot = tf.one_hot(csv_list[1], depth=98)
keep_prob = 0.5


# Creates batches of images and labels.
image_batch, label_batch = tf.train.shuffle_batch(
    [pix_feats, onehot], 
    batch_size=batch_size, num_threads=4, capacity=capacity, min_after_dequeue=30000)

return image_batch, label_batch

Steven Steven · Accepted Answer · 2016-09-07T14:19:45

I'm not sure about your record_defaults.

So there's a couple of ways to do it. you could have two different "shuffle_batch" one that will take in the training data and the other that will take in the validation data. Then you call to run one or the other.

train_loss = train(train_set)
val_loss = val(val_set)

sess.run([train_loss]) # or sess.run([val_loss])

Placeholders are an alternative.

Tensorflow - Using batching to form a validation set

1 Answers