I am coming from this tutorial, which uses a multinomial distribution in eager execution to get a final prediction for the next character for text generation, based on a predictions tensor coming from our RNN.
# using a multinomial distribution to predict the character returned by the model
temperature = 0.5
predictions = predictions / temperature
predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()
My questions are:
Isn't temperature (here 0.5) not just scaling all predictions, why does it influence the multinomial selection then?
[0.2, 0.4, 0.3, 0.1]/temperature = [0.4, 0.8, 0.6, 0.2]
So isn't the multinomial normalizing the probabilities? And thus when scaling we just increase the probability for each character with a limit at 1?
What does [-1, 0].numpy() do? I am completely lost with this one.
Any hints are appreciated.