2
votes

I would like to train two different LSTMs to make them interact in a dialogue context (ie one rnn generate a sequence, which will be used as a context for the second rnn, which will answer, etc...). However, I do not know how to train them separately on tensorflow (I think that I did not fully understand the logic behind tf graphs). When I execute my code, I get the following error:

Variable rnn/basic_lstm_cell/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?

The error happens when I create my second RNN. Do you know how to fix this ?

My code is the following:

#User LSTM
no_units=100
_seq_user = tf.placeholder(tf.float32, [batch_size, max_length_user, user_inputShapeLen], name='seq')
_seq_length_user = tf.placeholder(tf.int32, [batch_size], name='seq_length')

cell = tf.contrib.rnn.BasicLSTMCell(
        no_units)

output_user, hidden_states_user = tf.nn.dynamic_rnn(
    cell,
    _seq_user,
    dtype=tf.float32,
    sequence_length=_seq_length_user
)
out2_user = tf.reshape(output_user, shape=[-1, no_units])
out2_user =  tf.layers.dense(out2_user, user_outputShapeLen)

out_final_user = tf.reshape(out2_user, shape=[-1, max_length_user, user_outputShapeLen])
y_user_ = tf.placeholder(tf.float32, [None, max_length_user, user_outputShapeLen])


softmax_user = tf.nn.softmax(out_final_user, dim=-1)  
loss_user = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_user, labels=y_user_))
optimizer = tf.train.AdamOptimizer(learning_rate=10**-4)
minimize = optimizer.minimize(loss_user)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

for i in range(epoch):
    print 'Epoch: ', i
    batch_X, batch_Y, batch_sizes = lstm.batching(user_train_X, user_train_Y, sizes_user_train)
    for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes):
        sess.run(minimize, {_seq_user:data_, _seq_length_user:size_, y_user_:target_})

#System LSTM
no_units_system=100
_seq_system = tf.placeholder(tf.float32, [batch_size, max_length_system, system_inputShapeLen], name='seq_')
_seq_length_system = tf.placeholder(tf.int32, [batch_size], name='seq_length_')

cell_system = tf.contrib.rnn.BasicLSTMCell(
        no_units_system)

output_system, hidden_states_system = tf.nn.dynamic_rnn(
    cell_system,
    _seq_system,
    dtype=tf.float32,
    sequence_length=_seq_length_system
)
out2_system = tf.reshape(output_system, shape=[-1, no_units])
out2_system =  tf.layers.dense(out2_system, system_outputShapeLen)

out_final_system = tf.reshape(out2_system, shape=[-1, max_length_system, system_outputShapeLen])
y_system_ = tf.placeholder(tf.float32, [None, max_length_system, system_outputShapeLen])

softmax_system = tf.nn.softmax(out_final_system, dim=-1)  
loss_system = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_final_system, labels=y_system_))
optimizer = tf.train.AdamOptimizer(learning_rate=10**-4)
minimize = optimizer.minimize(loss_system)

for i in range(epoch):
    print 'Epoch: ', i
    batch_X, batch_Y, batch_sizes = lstm.batching(system_train_X, system_train_Y, sizes_system_train)
    for data_, target_, size_ in zip(batch_X, batch_Y, batch_sizes):
        sess.run(minimize, {_seq_system:data_, _seq_length_system:size_, y_system_:target_})
3

3 Answers

0
votes

Regarding the variable scope error, try setting different variable scope for each graph.

with tf.variable_scope('User_LSTM'): your user_lstm graph

with tf.variable_scope('System_LSTM'): your system_lstm graph

Also, avoid using same names for different python objects. (ex.optimizer) The second declaration will override the first declaration, which will confuse you when you use tensorboard.

By the way, I would recommend training the model end-to-end fashion rather than running two sessions separately. Try feeding the output tensor of the first LSTM into the second LSTM with single optimizer and loss function.

0
votes

To be short, to solve the problem(Variable rnn/basic_lstm_cell/weights already exists), what you need are 2 separated variable scopes (as is mentioned by @J-min). Because in tensorflow, variables are organized by their names, and by manage these two sets of variables in the two scopes, tensorflow will be able to distinguish them from each other.

And by train them separately on tensorflow, I suppose that you want to define two distinct loss functions, and optimize these two LSTM networks with two optimizers, each corresponding to one of the loss functions before.

Under such circumstances, you need to get the lists of these two sets of variables, and pass these lists to your optimizer, like that

opt1    = GradientDescentOptimizer(learning_rate=0.1)
opt_op1 = opt.minimize(loss1, var_list=<list of variables from scope 1>)

opt2    = GradientDescentOptimizer(learning_rate=0.1)
opt_op2 = opt.minimize(loss2, var_list=<list of variables from scope 2>)
0
votes

Just change argument name of the cells while initializing them. For example:

user_cell = tf.contrib.rnn.BasicLSTMCell(no_units, name='user')
system_cell = tf.contrib.rnn.BasicLSTMCell(no_units, name='system')

In this way, TensorFlow won't share the variables of two cells. Then you can get the outputs as:

output_user, hidden_states_user = tf.nn.dynamic_rnn(
    user_cell,
    _seq_system,
    dtype=tf.float32,
    sequence_length=_seq_length_system
)

output_system, hidden_states_system = tf.nn.dynamic_rnn(
    system_cell,
    _seq_system,
    dtype=tf.float32,
    sequence_length=_seq_length_system
)