0
votes

I am coming from this tutorial, which uses a multinomial distribution in eager execution to get a final prediction for the next character for text generation, based on a predictions tensor coming from our RNN.

# using a multinomial distribution to predict the character returned by the model
temperature = 0.5
predictions = predictions / temperature
predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()

My questions are:

  1. Isn't temperature (here 0.5) not just scaling all predictions, why does it influence the multinomial selection then?

    [0.2, 0.4, 0.3, 0.1]/temperature = [0.4, 0.8, 0.6, 0.2]

    So isn't the multinomial normalizing the probabilities? And thus when scaling we just increase the probability for each character with a limit at 1?

  2. What does [-1, 0].numpy() do? I am completely lost with this one.

Any hints are appreciated.

1

1 Answers

0
votes
  1. [i, :] represents the unnormalized log-probabilities for all classes.

Thus, the smaller the probability in the first place the smaller it becomes for temperatures smaller than 1. And the larger for temperatures lager than 1:

math.exp(0.4)/math.exp(0.8) = 0.670
math.exp(0.3)/ math.exp(0.6) = 0.7408
math.exp(0.2)/ math.exp(0.4) = 0.818
math.exp(0.1)/ math.exp(0.2) = 0.9048
  1. [-1, 0].numpy() just gets the value of the multinomial tensor

such as:

tf.multinomial(predictions, num_samples=1)
tf.Tensor([[3]], shape=(1, 1), dtype=int64)
to 3