Using batch size with TensorFlow Validation Monitor

Question

I'm using tf.contrib.learn.Estimator to train a CNN having 20+ layers. I'm using GTX 1080 (8 GB) for training. My dataset is not so large but my GPU runs out of memory with a batch size greater than 32. So I'm using a batch size of 16 for training and Evaluating the classifier (GPU runs out of memory while evaluation as well if a batch_size is not specified).

  # Configure the accuracy metric for evaluation
  metrics = {
      "accuracy":
          learn.MetricSpec(
              metric_fn=tf.metrics.accuracy, prediction_key="classes"),
  }

  # Evaluate the model and print results
  eval_results = classifier.evaluate(
      x=X_test, y=y_test, metrics=metrics, batch_size=16)

Now the problem is that after every 100 steps, I only get training loss printed on screen. I want to print validation loss and accuracy as well, So I'm using a ValidationMonitor

 validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
      X_test,
      y_test,
      every_n_steps=50)
  # Train the model
  classifier.fit(
      x=X_train,
      y=y_train,
      batch_size=8,
      steps=20000,
      monitors=[validation_monitor]

ActualProblem: My code crashes (Out of Memory) when I use ValidationMonitor, I think the problem might be solved if I could specify a batch size here as well and I can't figure out how to do that. I want ValidationMonitor to evaluate my validation data in batches, like I do it manually after training using classifier.evaluate, is there a way to do that?

Dan Moldovan Dan Moldovan · Accepted Answer · 2017-06-05T13:52:12

2

votes

The ValidationMonitor's constructor accepts a batch_size arg that should do the trick.

Using batch size with TensorFlow Validation Monitor

2 Answers