8
votes

In tensorflow layers.dense(inputs, units, activation) implements a Multi-Layer Perceptron layer with arbitrary activation function.

Output = activation(matmul(input, weights) + bias)

Typically input has shape=[batch_size, input_size] and might look like this: (units = 128 and activation = tf.nn.relu are chosen arbitrarily)

inputx = tf.placeholder(float, shape=[batch_size, input_size])
dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)

I have not found any documentation on what would happen, if i fed higher dimensional input, e.g. because one might have time_steps resulting in a tensor of shape=[time_step, batch_size, input_size]. What one would want here is that the layer is applied to each single input_vector for each timestep for each element of the batch. To put it a bit differently, the internal matmul of layers.dense() should simply use broadcasting in numpy style. Is the behaviour i expect here what actually happens? I.e. is:

inputx = tf.placeholder(float, shape=[time_step, batch_size, input_size])
dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)

applying the dense layer to each input of size input_size for each time_step for each element in batch_size? This should then result in a tensor(in dense_layer above) of shape=[time_step, batch_size, 128] I'm asking, as e.g. tf.matmul does not support broadcasting in the numpy style, so i'm not sure, how tensorflow handles these cases.

Edit: This post is related, but does not finally answer my question

1
Actually the answer to the question you linked applies to yours as well. layers.dense uses tensordot in such a way that you are essentially processing each time step independently.xdurch0

1 Answers

1
votes

You can verify your expectation by checking the shape of the dense kernel as follows.

>>> inputx = tf.placeholder(float, shape=[2,3,4])
>>> dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)
>>> g=tf.get_default_graph()
>>> g.get_collection('variables')
[<tf.Variable 'dense/kernel:0' shape=(4, 128) dtype=float32_ref>, <tf.Variable 'dense/bias:0' shape=(128,) dtype=float32_ref>]

The behavior of the dense layer is the same as a conv layer.

You can consider inputx as an image which has width=2, height=3 and channel=4 and the dense layer as a conv layer which has 128 filters and filters size is 1*1.