I'm trying to get to grips with the basics of neural networks and am struggling to understand keras layers.
Take the following code from tensorflow's tutorials:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])
So this network has 3 layers? The first is just the 28*28 nodes representing the pixel values. The second is a hidden layer which takes weighted sums from the first, applies relu and then sends these to 10 output layers which are softmaxed?
but then this model seems to require different inputs to the layers:
model = keras.Sequential([
layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
layers.Dense(64, activation=tf.nn.relu),
layers.Dense(1)
])
Why does the input layer now have both an input_shape and a value 64? I read that the first parameter specifies the number of nodes in the second layer, but that doesn't seem to fit with the code in the first example. Also, why does the input layer have an activation? Is this just relu-ing the values before they enter the network?
Also, with regards activation functions, why are softmax and relu treated as alternatives? I thought relu applied to all the inputs of a single node, whereas softmax acted on the outputs of all the nodes across a layer?
Any help is really appreciated!
First example is from: https://www.tensorflow.org/tutorials/keras/basic_classification
Second example is from: https://www.tensorflow.org/tutorials/keras/basic_regression
Flattenlayer, which simply reshapes the input. In the second case the first layer is aDenselayer, which requires a layer size. Usually the first layer in sequential models get aninput_shapeparameter to specify the shape of the input, but otherwise they are just the same as layers at any other point. - jdehesa