1
votes

I have a simple model trained on MNIST with 600 nodes in a hidden layer.

Some precursors...

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, InputLayer, Activation
from keras.optimizers import RMSprop, Adam
import numpy as np
import h5py
import matplotlib.pyplot as plt
from keras import backend as K
import tensorflow as tf

MNIST Loading

batch_size = 128
num_classes = 10
epochs = 50

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# One hot conversion
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Designing model

model = Sequential() 
###Model###
model.add(Dense(600, input_dim=784))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))
model.summary()

tfcall = keras.callbacks.TensorBoard(log_dir='./keras600logs', histogram_freq=1, batch_size=batch_size, write_graph=True)

model.compile(loss='categorical_crossentropy',optimizer=Adam(), metrics=['accuracy'])

history = model.fit(x_train, y_train,
    batch_size=batch_size,
    epochs=10, #EPOCHS
    verbose=1,
    validation_data=(x_test, y_test),
    callbacks=[tfcall])
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Now comes the new part. I want to dynamically (i.e. with every new input image) be able to define a 'mask' that will turn off some of the 600 neurons in the hidden layer, preventing them from passing their activation on to the output layer.

mask_i = [0, 0, 1, 0, 1, .... 0, 1, 0, 0] (1x600)

such that for an input image i the mask indices with a 1 corresponds to a node that is shut off while processing image i.

What is the best way to go about doing this?

Do we have another node from input with weights TOWARDS hidden layer of -100000000 so that it will overwhelm whatever the activation is normally there (and relu will do the rest). This is kind of like hacking the bias dynamically.

Do we create another hidden layer where each of the 600 nodes is directly connected to exactly one node from the first hidden layer (itself) with a dynamic weight of either 0 (off) or 1 (proceed as normal) and then fully connect that new hidden layer to output?

Both of these seem a bit hackish, wanted to know what others out there think.

1
This sounds like Dropout at inference time, is that what you want to do?Dr. Snoopy
Not quite. We have a very specific way of choosing the subset of nodes to be turned off. We are interested in seeing what the prediction is when those nodes no longer contribute to the output layer.Eruditio

1 Answers

3
votes

I think the best way is to put a lambda layer with a mask after that dense layer.

There is no way to do it without a little hacking, but this is quite a clean hack.

Create a variable for the mask:

import keras.backend as K

#create a var with length 600 and 2D shape
mask = K.variable([[0,1,0,0,0,1,1,0,....,1]])
    #careful: 0 means off
    #(same number of dimensions of the output of the dense layer)
    #make sure the shape is either
        #(1,600) - same mask for all samples; or
        #(batch_size,600) - one mask per sample

#important: whenever you want to change the mask, you must use:
K.set_value(mask,newValue)
    #otherwise you will not be changing the variable connected to the model

Add the lambda layer in the model:

....
model.add(Dense(600, input_dim=784))
model.add(Lambda(lambda x: x * mask))
model.add(Activation('relu'))
....

If you want this more elegant, you can use a functional API model, making mask one additional input with Input(tensor=mask). I don't know if there is any advantage in doing this, though.