Tensorflow multinomial distribution with eager execution

Question

I am coming from this tutorial, which uses a multinomial distribution in eager execution to get a final prediction for the next character for text generation, based on a predictions tensor coming from our RNN.

# using a multinomial distribution to predict the character returned by the model
temperature = 0.5
predictions = predictions / temperature
predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()

My questions are:

Isn't temperature (here 0.5) not just scaling all predictions, why does it influence the multinomial selection then?

[0.2, 0.4, 0.3, 0.1]/temperature = [0.4, 0.8, 0.6, 0.2]

So isn't the multinomial normalizing the probabilities? And thus when scaling we just increase the probability for each character with a limit at 1?
What does [-1, 0].numpy() do? I am completely lost with this one.

Any hints are appreciated.

mrk mrk · Accepted Answer · 2019-08-12T08:30:36

[i, :] represents the unnormalized log-probabilities for all classes.

Thus, the smaller the probability in the first place the smaller it becomes for temperatures smaller than 1. And the larger for temperatures lager than 1:

math.exp(0.4)/math.exp(0.8) = 0.670
math.exp(0.3)/ math.exp(0.6) = 0.7408
math.exp(0.2)/ math.exp(0.4) = 0.818
math.exp(0.1)/ math.exp(0.2) = 0.9048

[-1, 0].numpy() just gets the value of the multinomial tensor

such as:

tf.multinomial(predictions, num_samples=1)
tf.Tensor([[3]], shape=(1, 1), dtype=int64)
to 3

Tensorflow multinomial distribution with eager execution

1 Answers