How do I share weights across different RNN cells that feed in different inputs in Tensorflow?

Question

I'm curious if there is a good way to share weights across different RNN cells while still feeding each cell different inputs.

The graph that I am trying to build is like this:

where there are three LSTM Cells in orange which operate in parallel and between which I would like to share the weights.

I've managed to implement something similar to what I want using a placeholder (see below for code). However, using a placeholder breaks the gradient calculations of the optimizer and doesn't train anything past the point where I use the placeholder. Is it possible to do this a better way in Tensorflow?

I'm using Tensorflow 1.2 and python 3.5 in an Anaconda environment on Windows 7.

Code:

def ann_model(cls,data, act=tf.nn.relu):
    with tf.name_scope('ANN'):
        with tf.name_scope('ann_weights'):
            ann_weights = tf.Variable(tf.random_normal([1,
                                                        cls.n_ann_nodes]))
        with tf.name_scope('ann_bias'):
            ann_biases = tf.Variable(tf.random_normal([1]))
        out = act(tf.matmul(data,ann_weights) + ann_biases)
    return out

def rnn_lower_model(cls,data):
    with tf.name_scope('RNN_Model'):
        data_tens = tf.split(data, cls.sequence_length,1)
        for i in range(len(data_tens)):
            data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
                                                     cls.n_rnn_inputs])

        rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)

        outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
                                                    data_tens,
                                                    dtype=tf.float32)

        with tf.name_scope('RNN_out_weights'):
            out_weights = tf.Variable(
                    tf.random_normal([cls.n_rnn_nodes_lower,1]))
        with tf.name_scope('RNN_out_biases'):
            out_biases = tf.Variable(tf.random_normal([1]))

        #Encode the output of the RNN into one estimate per entry in 
        #the input sequence
        predict_list = []
        for i in range(cls.sequence_length):
            predict_list.append(tf.matmul(outputs[i],
                                          out_weights) 
                                          + out_biases)
    return predict_list

def create_graph(cls,sess):
    #Initializes the graph
    with tf.name_scope('input'):
        cls.x = tf.placeholder('float',[cls.batch_size,
                                       cls.sequence_length,
                                       cls.n_inputs])
    with tf.name_scope('labels'):
        cls.y = tf.placeholder('float',[cls.batch_size,1])
    with tf.name_scope('community_id'):
        cls.c = tf.placeholder('float',[cls.batch_size,1])

    #Define Placeholder to provide variable input into the 
    #RNNs with shared weights    
    cls.input_place = tf.placeholder('float',[cls.batch_size,
                                              cls.sequence_length,
                                              cls.n_rnn_inputs])

    #global step used in optimizer
    global_step = tf.Variable(0,trainable = False)

    #Create ANN
    ann_output = cls.ann_model(cls.c)
    #Combine output of ANN with other input data x
    ann_out_seq = tf.reshape(tf.concat([ann_output for _ in 
                                            range(cls.sequence_length)],1),
                            [cls.batch_size,
                             cls.sequence_length,
                             cls.n_ann_nodes])
    cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)

    #Create 'unrolled' RNN by creating sequence_length many RNN Cells that
    #share the same weights.
    with tf.variable_scope('Lower_RNNs'):
        #Create RNNs
        daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2

When training mini-batches are calculated in two steps:

RNNinput = sess.run(cls.rnn_input,feed_dict = {
                                            cls.x:batch_x,
                                            cls.y:batch_y,
                                            cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
                                       cls.y:batch_y,
                                       cls.x:batch_x,
                                       cls.c:batch_c})

Thanks for your help. Any ideas would be appreciated.

The second one is the same as the first, but includes the 'RNNinput' that was a result of the first 'sess.run'. It's how I am passing the output of the lower layer with shared RNN cells to the upper layer. I use the placeholder 'cls.input_place' to do this in the second 'sess.run' call. Unfortunately, this breaks tensorflow's backpropagation calculations though. — AlexR
You should not do that. You build a graph like you mentioned in the link, feed the inputs once and let the whole network train. Any reason, why you were not able to do that? — vijay m
Because each RNN cell in the middle layer that is sharing weights requires a different input to create 3 different outputs, which are concatenated together and then inputted into the final layer. In order to share the weights I had to use [cls.rnn_lower_model(cls.input_place)]*2. If input_place was simply a node in the graph, I could not vary the inputs for different instances of the same shared cell. — AlexR

vijay m vijay m · Accepted Answer · 2017-06-28T20:08:59

You have 3 different inputs : input_1, input_2, input_3 fed it to a LSTM model which has the parameters shared. And then you concatenate the outputs of the 3 lstm and pass it to a final LSTM layer. The code should look something like this:

 # Create input placeholder for the network
 input_1 = tf.placeholder(...)
 input_2 = tf.placeholder(...)
 input_3 = tf.placeholder(...)

 # create a shared rnn layer 
 def shared_rnn(...):
    ...
    rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)

 # generate the outputs for each input
 with tf.variable_scope('lower_lstm') as scope:
    out_input_1 = shared_rnn(...)
    scope.reuse_variables() # the variables will be reused.
    out_input_2 = shared_rnn(...)
     scope.reuse_variables()
    out_input_3 = shared_rnn(...)

 # verify whether the variables are reused
 for v in tf.global_variables():
    print(v.name)

 # concat the three outputs
 output = tf.concat...  

 # Pass it to the final_lstm layer and out the logits
 logits = final_layer(output, ...)

 train_op = ...

 # train
   sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}

How do I share weights across different RNN cells that feed in different inputs in Tensorflow?

2 Answers