This answer is based on the experimentation I have done on the getting started tutorial code.
Mad Wombat has given a detailed explanation of the terms num_epochs, batch_size and steps. This answer is an extension to his answer.
num_epochs - The maximum number of times the program can iterate over the entire dataset in one train()
. Using this argument, we can restrict the number of batches that can be processed during execution of one train()
method.
batch_size - The number of examples in a single batch emitted by the input_fn
steps - Number of batches the LinearRegressor.train()
method can process in one execution
max_steps is another argument for LinearRegressor.train()
method. This argument defines the maximum number of steps (batches) can process in the LinearRegressor()
objects lifetime.
Let's whats this means. The following experiments change two lines of the code provided by the tutorial. Rest of the code remains as is.
Note: For all the examples, assume the number of training i.e. the length of x_train to be equal to 4.
Ex 1:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=4, num_epochs=2, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
In this example, we defined the batch_size = 4 and num_epochs = 2. So, the input_fn
can emit just 2 batches of input data for one execution of train()
. Even though we defined steps = 10, the train()
method stops after 2 steps.
Now, execute the estimator.train(input_fn=input_fn, steps=10)
again. We can see that 2 more steps have been executed. We can keep executing the train()
method again and again. If we execute train()
50 times, a total of 100 steps have been executed.
Ex 2:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=2, num_epochs=2, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
In this example, the value of batch_size is changed to 2 (it was equal to 4 in Ex 1). Now, in each execution of train()
method, 4 steps are processed. After the 4th step, there are no batches to run on. If the train()
method is executed again, another 4 steps are processed making it a total of 8 steps.
Here, the value of steps doesn't matter because the train()
method can get a maximum of 4 batches. If the value of steps is less than (num_epochs x training_size) / batch_size, see ex 3.
Ex 3:
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=2, num_epochs=8, shuffle=True)
estimator.train(input_fn=input_fn, steps=10)
Now, let batch_size = 2, num_epochs = 8 and steps = 10. The input_fn
can emit a total of 16 batches in one run of train()
method. However, steps is set to 10. This means that eventhough input_fn
can provide 16 batches for execution, train()
must stop after 10 steps. Ofcourse, train()
method can be re-executed for more steps cumulatively.
From examples 1, 2, & 3, we can clearly see how the values of steps, num_epoch and batch_size affect the number of steps that can be executed by train()
method in one run.
The max_steps argument of train()
method restricts the total number of steps that can be run cumulatively by train()
Ex 4:
If batch_size = 4, num_epochs = 2, the input_fn
can emit 2 batches for one train()
execution. But, if max_steps
is set to 20, no matter how many times train()
is executed only 20 steps will run in optimization. This is in contrast to example 1, where the optimizer can run to 200 steps if the train()
method is exuted 100 times.
Hope this gives a detailed understanding of what these arguments mean.