I'm attempting to use tensorflow's batching system, as detailed here https://www.tensorflow.org/versions/master/how_tos/reading_data/index.html to make predictions using a model that I trained previously. At the moment I have set the batch size that I use in tf.train.batch to be equal to the size of the data set that I want to make predictions over.
However, I want to create a validation set to test my predictions and avoid overfitting.
Is there a way to separate a validation set from the training data using the batching system or is the only way to use placeholders?
Below is a sample of my code responsible for training. It:
- Reads data from a CSV file, converts data to tensors
Passes tensors to tf.train.shuffle_batch to train
def input_pipeline(filename_list, batch_size, capacity): filename_queue = tf.train.string_input_producer(filename_list,num_epochs=None) reader = tf.TextLineReader() key, value = reader.read(filename_queue)
# Defaults force key value and label to int, all others to float. record_defaults = [[1]]+[[46]]+[[1.0] for i in range(436)] # Reads in a single row from the CSV and outputs a list of scalars. csv_list = tf.decode_csv(value, record_defaults=record_defaults) # Packs the different columns into separate feature tensors. location = tf.pack(csv_list[2:4]) bbox = tf.pack(csv_list[5:8]) pix_feats = tf.pack(csv_list[9:]) onehot = tf.one_hot(csv_list[1], depth=98) keep_prob = 0.5 # Creates batches of images and labels. image_batch, label_batch = tf.train.shuffle_batch( [pix_feats, onehot], batch_size=batch_size, num_threads=4, capacity=capacity, min_after_dequeue=30000) return image_batch, label_batch