0
votes

I have written my own Multi-Layer Perceptron in TensorFlow, in which I initialize the weights and biases like this:

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, hidden_layer_sizes[0]], 0, 0.1, seed=random_state)),  # 1 hidden layer is mandatory
}
biases = {
    'b1': tf.Variable(tf.random_normal([hidden_layer_sizes[0]], 0, 0.1, seed=random_state)),
}
for i in range(len(hidden_layer_sizes)-1):
    weights['h'+str(i+2)] = tf.Variable(tf.random_normal([hidden_layer_sizes[i], hidden_layer_sizes[i+1]], 0, 0.1, seed=random_state))
    biases['b'+str(i+2)] = tf.Variable(tf.random_normal([hidden_layer_sizes[i+1]], 0, 0.1, seed=random_state))
weights['out'] = tf.Variable(tf.random_normal([hidden_layer_sizes[-1], n_classes], 0, 0.1, seed=random_state))
biases['out'] = tf.Variable(tf.random_normal([n_classes], 0, 0.1, seed=random_state))

The number of hidden layers varies between 1 and 4, depending on the input. I have been reading on the Internet about alternative ways of initializing the weights, and I wonder if they are applicable in the MLP model or only in more complex models like CNNs. For example, the Xavier, the HE, the variance-scaled initialization etc.

Does any of the alternative initializers are applicable in my case and which one is considered the best for this type of network?

2

2 Answers

1
votes

It depends on the size of your MLP. Initializations are generally made for one of two reasons:

  • To prevent exploding or vanishing gradients
  • To initialize more correctly, therefore helping convergence speed and results

Usually for networks with a few layers and for networks with few neurons the initialization does not matter a lot. You can try it out though and see for yourself. Xavier and He are indeed the more better ones. In general there isn't really a "best one" for any type of network and it might pay off to experiment a bit.

0
votes

This is how I implemented it in my code. First I defined the following function:

def get_initial_weights(self, varname, shape, initializer="random_normal"):
    if initializer == 'random_normal':
        return tf.Variable(tf.random_normal(shape=shape, mean=0, stddev=0.1, seed=self.random_state))
    elif initializer == "xavier":
        return tf.get_variable(varname, shape=shape, initializer=tf.contrib.layers.xavier_initializer())
    elif initializer == "he":
        return tf.get_variable(varname, shape=shape, initializer=tf.variance_scaling_initializer())

Then inside the main body of my class I substituted the code in my 1st post for the following:

# Store layers weight & bias
weights = {
    'h1': self.get_initial_weights('h1', [n_input, hidden_layer_sizes[0]], initializer=initializer)
}
biases = {
    'b1': self.get_initial_weights('b1', [hidden_layer_sizes[0]], initializer=initializer)
}
for i in range(len(hidden_layer_sizes)-1):
    weights['h' + str(i + 2)] = self.get_initial_weights('h' + str(i + 2), [hidden_layer_sizes[i], hidden_layer_sizes[i+1]], initializer=initializer)
    biases['b'+str(i+2)] = self.get_initial_weights('b'+str(i+2), [hidden_layer_sizes[i+1]], initializer=initializer)
weights['hout'] = self.get_initial_weights('hout', [hidden_layer_sizes[-1], n_classes], initializer=initializer)
biases['bout'] = self.get_initial_weights('bout', [n_classes], initializer=initializer)