I would like to train a model to generate text, similar to this blog post
This model uses - as far as I understand it - the following architecture
[Sequence of Word Indices] -> [Embedding] -> [LSTM] -> [1 Hot Encoded "next word"]
Basically, the author models the process as classification problem, where the output layer has as many dimensions as there are words in the corpus.
I would like to model the process as regression problem, by re-using the learned Embeddings and then minimising the distance between predicted and real embedding.
Basically:
[Sequence of Word Indices] -> [Embedding] -> [LSTM] -> [Embedding-Vector of the "next word"]
My problem is, as the model is learning the embeddings on the fly, how could I feed the output in the same way I feed the input (as word indices) and then just tell the model "But before you use the output, replace it by its embedding vector" ?
Thank you very much for all help :-)