3
votes

I would like to train a model to generate text, similar to this blog post

This model uses - as far as I understand it - the following architecture
[Sequence of Word Indices] -> [Embedding] -> [LSTM] -> [1 Hot Encoded "next word"]

Basically, the author models the process as classification problem, where the output layer has as many dimensions as there are words in the corpus.


I would like to model the process as regression problem, by re-using the learned Embeddings and then minimising the distance between predicted and real embedding.

Basically:

[Sequence of Word Indices] -> [Embedding] -> [LSTM] -> [Embedding-Vector of the "next word"]

My problem is, as the model is learning the embeddings on the fly, how could I feed the output in the same way I feed the input (as word indices) and then just tell the model "But before you use the output, replace it by its embedding vector" ?


Thank you very much for all help :-)

1
Did you get this working? If so, any links to code/blog (or even a self-answer) - the correctly marked answer is only showing the classification approach, not the regression approach you wanted to do.Darren Cook

1 Answers

4
votes

In training phase:

You can use two inputs (one for target, one for input, there's an offset of 1 between these two sequences) and reuse the embedding layer. If you input sentence is [1, 2, 3, 4], you can generate two sequence from it: in = [1, 2, 3], out = [2, 3, 4]. Then you can use Keras' functional API to reuse embedding layer:

emb1 = Embedding(in)
emb2 = Embedding(out)
predict_emb = LSTM(emb1)
loss = mean_squared_error(emb2, predict_emb)

Note it's not Keras code, just pseudo code.

In testing phase:

Typically, you'll need to write your own decode function. Firstly, you choose a word (or a few words) to start from. Then, feed this word (or short word sequence) to network to predict next word's embedding. At this step, you can define your own sample function, say: you may want to choose the word whose embedding is nearest to the predicted one as the next word, or you may want to sample the next word from a distribution in which words with nearer embeddings to the predicted embedding has a larger probability to be chosen. Once you choose the next word, then feed it to network and predict the next one, and so forth.

So, you need to generate one word (put it another way, one embedding) at a time rather than input a whole sequence to the network.

If the above statements are too abstract for you, here's an good example: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

Line 85 is the introduction part, which randomly choose a small piece of texts from corpus to work on. From line 90 on there's a loop, in which each step samples a character (This is a char-rnn, so each timestep inputs a char. For your case, it should be a word, not a char): L95 predicts next char's distribution, L96 samples from the distribution. Hope this is clear enough.