4
votes

I'm curious if there is a good way to share weights across different RNN cells while still feeding each cell different inputs.

The graph that I am trying to build is like this:

enter image description here

where there are three LSTM Cells in orange which operate in parallel and between which I would like to share the weights.

I've managed to implement something similar to what I want using a placeholder (see below for code). However, using a placeholder breaks the gradient calculations of the optimizer and doesn't train anything past the point where I use the placeholder. Is it possible to do this a better way in Tensorflow?

I'm using Tensorflow 1.2 and python 3.5 in an Anaconda environment on Windows 7.

Code:

def ann_model(cls,data, act=tf.nn.relu):
    with tf.name_scope('ANN'):
        with tf.name_scope('ann_weights'):
            ann_weights = tf.Variable(tf.random_normal([1,
                                                        cls.n_ann_nodes]))
        with tf.name_scope('ann_bias'):
            ann_biases = tf.Variable(tf.random_normal([1]))
        out = act(tf.matmul(data,ann_weights) + ann_biases)
    return out

def rnn_lower_model(cls,data):
    with tf.name_scope('RNN_Model'):
        data_tens = tf.split(data, cls.sequence_length,1)
        for i in range(len(data_tens)):
            data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
                                                     cls.n_rnn_inputs])

        rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)

        outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
                                                    data_tens,
                                                    dtype=tf.float32)

        with tf.name_scope('RNN_out_weights'):
            out_weights = tf.Variable(
                    tf.random_normal([cls.n_rnn_nodes_lower,1]))
        with tf.name_scope('RNN_out_biases'):
            out_biases = tf.Variable(tf.random_normal([1]))

        #Encode the output of the RNN into one estimate per entry in 
        #the input sequence
        predict_list = []
        for i in range(cls.sequence_length):
            predict_list.append(tf.matmul(outputs[i],
                                          out_weights) 
                                          + out_biases)
    return predict_list

def create_graph(cls,sess):
    #Initializes the graph
    with tf.name_scope('input'):
        cls.x = tf.placeholder('float',[cls.batch_size,
                                       cls.sequence_length,
                                       cls.n_inputs])
    with tf.name_scope('labels'):
        cls.y = tf.placeholder('float',[cls.batch_size,1])
    with tf.name_scope('community_id'):
        cls.c = tf.placeholder('float',[cls.batch_size,1])

    #Define Placeholder to provide variable input into the 
    #RNNs with shared weights    
    cls.input_place = tf.placeholder('float',[cls.batch_size,
                                              cls.sequence_length,
                                              cls.n_rnn_inputs])

    #global step used in optimizer
    global_step = tf.Variable(0,trainable = False)

    #Create ANN
    ann_output = cls.ann_model(cls.c)
    #Combine output of ANN with other input data x
    ann_out_seq = tf.reshape(tf.concat([ann_output for _ in 
                                            range(cls.sequence_length)],1),
                            [cls.batch_size,
                             cls.sequence_length,
                             cls.n_ann_nodes])
    cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)

    #Create 'unrolled' RNN by creating sequence_length many RNN Cells that
    #share the same weights.
    with tf.variable_scope('Lower_RNNs'):
        #Create RNNs
        daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2

When training mini-batches are calculated in two steps:

RNNinput = sess.run(cls.rnn_input,feed_dict = {
                                            cls.x:batch_x,
                                            cls.y:batch_y,
                                            cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
                                       cls.y:batch_y,
                                       cls.x:batch_x,
                                       cls.c:batch_c})

Thanks for your help. Any ideas would be appreciated.

2
Why you have two feed_dict for?vijay m
The second one is the same as the first, but includes the 'RNNinput' that was a result of the first 'sess.run'. It's how I am passing the output of the lower layer with shared RNN cells to the upper layer. I use the placeholder 'cls.input_place' to do this in the second 'sess.run' call. Unfortunately, this breaks tensorflow's backpropagation calculations though.AlexR
You should not do that. You build a graph like you mentioned in the link, feed the inputs once and let the whole network train. Any reason, why you were not able to do that?vijay m
Because each RNN cell in the middle layer that is sharing weights requires a different input to create 3 different outputs, which are concatenated together and then inputted into the final layer. In order to share the weights I had to use [cls.rnn_lower_model(cls.input_place)]*2. If input_place was simply a node in the graph, I could not vary the inputs for different instances of the same shared cell.AlexR

2 Answers

2
votes

You have 3 different inputs : input_1, input_2, input_3 fed it to a LSTM model which has the parameters shared. And then you concatenate the outputs of the 3 lstm and pass it to a final LSTM layer. The code should look something like this:

 # Create input placeholder for the network
 input_1 = tf.placeholder(...)
 input_2 = tf.placeholder(...)
 input_3 = tf.placeholder(...)

 # create a shared rnn layer 
 def shared_rnn(...):
    ...
    rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)

 # generate the outputs for each input
 with tf.variable_scope('lower_lstm') as scope:
    out_input_1 = shared_rnn(...)
    scope.reuse_variables() # the variables will be reused.
    out_input_2 = shared_rnn(...)
     scope.reuse_variables()
    out_input_3 = shared_rnn(...)

 # verify whether the variables are reused
 for v in tf.global_variables():
    print(v.name)

 # concat the three outputs
 output = tf.concat...  

 # Pass it to the final_lstm layer and out the logits
 logits = final_layer(output, ...)

 train_op = ...

 # train
   sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}
0
votes

I ended up rethinking my architecture a little and came up with a more workable solution.

Instead of duplicating the middle layer of LSTM cells to create three different cells with the same weights, I chose to run the same cell three times. The results of each run were stored in a 'buffer' like tf.Variable, and then that whole variable was used as an input into the final LSTM layer. I drew a diagram here

Implementing it this way allowed for valid outputs after 3 time steps, and didn't break tensorflows backpropagation algorithm (i.e. The nodes in the ANN could still train.)

The only tricky thing was to make sure that the buffer was in the correct sequential order for the final RNN.