0
votes

I'm attempting to use tensorflow's batching system, as detailed here https://www.tensorflow.org/versions/master/how_tos/reading_data/index.html to make predictions using a model that I trained previously. At the moment I have set the batch size that I use in tf.train.batch to be equal to the size of the data set that I want to make predictions over.

However, I want to create a validation set to test my predictions and avoid overfitting.

Is there a way to separate a validation set from the training data using the batching system or is the only way to use placeholders?

Below is a sample of my code responsible for training. It:

  • Reads data from a CSV file, converts data to tensors
  • Passes tensors to tf.train.shuffle_batch to train

    def input_pipeline(filename_list, batch_size, capacity): filename_queue = tf.train.string_input_producer(filename_list,num_epochs=None) reader = tf.TextLineReader() key, value = reader.read(filename_queue)

    # Defaults force key value and label to int, all others to float.
    record_defaults = [[1]]+[[46]]+[[1.0] for i in range(436)]
    # Reads in a single row from the CSV and outputs a list of scalars.
    csv_list = tf.decode_csv(value, record_defaults=record_defaults)
    # Packs the different columns into separate feature tensors.
    location = tf.pack(csv_list[2:4])
    bbox = tf.pack(csv_list[5:8])
    pix_feats = tf.pack(csv_list[9:])
    onehot = tf.one_hot(csv_list[1], depth=98)
    keep_prob = 0.5
    
    
    # Creates batches of images and labels.
    image_batch, label_batch = tf.train.shuffle_batch(
        [pix_feats, onehot], 
        batch_size=batch_size, num_threads=4, capacity=capacity, min_after_dequeue=30000)
    
    return image_batch, label_batch
    
1

1 Answers

0
votes

I'm not sure about your record_defaults.

So there's a couple of ways to do it. you could have two different "shuffle_batch" one that will take in the training data and the other that will take in the validation data. Then you call to run one or the other.

train_loss = train(train_set)
val_loss = val(val_set)

sess.run([train_loss]) # or sess.run([val_loss])

Placeholders are an alternative.