I am new to keras and have read blog posts about deep learning classification using keras but, even after reading a lot of them, I am unable to figure out how each of them have calculated the parameter value of first dense layer just after flatten layer in their code. for example:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten
def createModel():
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', activation='relu',input_shape=input_shape))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(nClasses, activation='softmax'))
My doubts:
- How did the programmer decide on the value '512' for this dense layer?
- Is it totally random? because I know that in this example, flatten has 256 parameters so logic says they are multiplying it by 2 to get the value of 512. But, this logic does not follow in any other case I have read.
- How did this dense layer affect the training?
If I put too large a value, like in my code below, going by the logic I multiplied my flatten parameter 86400 by 2 i.e. 172800, I get the following error:
model = Sequential() model.add(Conv2D(32, (3, 3),input_shape=input_shape)) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Conv2D(64, (3, 3) )) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Conv2D(96, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) > model.add(Dense(172800)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(4)) model.add(Activation('softmax')) model.summary()
ValueError: rng_mrg cpu-implementation does not support more than (2**31 -1) samples
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
This is my summary of the model without first dense layer
Layer (type) Output Shape Param #
=================================================================
conv2d_4 (Conv2D) (None, 254, 254, 32) 896
_________________________________________________________________
activation_4 (Activation) (None, 254, 254, 32) 0
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 127, 127, 32) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 127, 127, 32) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 125, 125, 64) 18496
_________________________________________________________________
activation_5 (Activation) (None, 125, 125, 64) 0
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 62, 62, 64) 0
_________________________________________________________________
dropout_5 (Dropout) (None, 62, 62, 64) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 60, 60, 96) 55392
_________________________________________________________________
activation_6 (Activation) (None, 60, 60, 96) 0
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 30, 30, 96) 0
_________________________________________________________________
dropout_6 (Dropout) (None, 30, 30, 96) 0
_________________________________________________________________
flatten_2 (Flatten) (None, 86400) 0
_________________________________________________________________
activation_7 (Activation) (None, 86400) 0
_________________________________________________________________
dropout_7 (Dropout) (None, 86400) 0
_________________________________________________________________
dense_2 (Dense) (None, 4) 345604
_________________________________________________________________
activation_8 (Activation) (None, 4) 0
Total params: 420,388
Trainable params: 420,388
Non-trainable params: 0
When I eliminate this layer altogether, my code works or even if I put smaller value, my code still works but, I don't want to blindly set parameters without knowing the reason.