In tensorflow layers.dense(inputs, units, activation) implements a Multi-Layer Perceptron layer with arbitrary activation function.
Output = activation(matmul(input, weights) + bias)
Typically input has shape=[batch_size, input_size] and might look like this: (units = 128 and activation = tf.nn.relu are chosen arbitrarily)
inputx = tf.placeholder(float, shape=[batch_size, input_size])
dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)
I have not found any documentation on what would happen, if i fed higher dimensional input, e.g. because one might have time_steps resulting in a tensor of shape=[time_step, batch_size, input_size]. What one would want here is that the layer is applied to each single input_vector for each timestep for each element of the batch. To put it a bit differently, the internal matmul of layers.dense() should simply use broadcasting in numpy style. Is the behaviour i expect here what actually happens? I.e. is:
inputx = tf.placeholder(float, shape=[time_step, batch_size, input_size])
dense_layer = tf.layers.dense(inputx, 128, tf.nn.relu)
applying the dense layer to each input of size input_size for each time_step for each element in batch_size? This should then result in a tensor(in dense_layer above) of shape=[time_step, batch_size, 128] I'm asking, as e.g. tf.matmul does not support broadcasting in the numpy style, so i'm not sure, how tensorflow handles these cases.
Edit: This post is related, but does not finally answer my question
layers.dense
usestensordot
in such a way that you are essentially processing each time step independently. – xdurch0