I want to use a 1D convolutional neural network for regression.
I have about 1500 training samples, each having 40 features. I am training in batches of around 200-300 samples.
I am not sure if I have the code set up correctly. Each input sample is essentially a 1D vector with 40 elements, so in the first convolution layer I want each filter to pass along the length of each vector (independently) in the training batch. Have I set it up the width, height, channels etc. correctly to achieve this?
My code is:
width = 40
channels = 1
n_outputs = 1
X = tf.placeholder(tf.float32, shape=[None, width], name="X")
X_reshaped = tf.reshape(X, shape=[-1, width, channels])
y = tf.placeholder(tf.float32, shape=[None, channels], name="y")
y_reshaped = tf.reshape(y, shape=[-1, channels])
training = tf.placeholder_with_default(False, shape=(), name='training')
with tf.name_scope("cnn"):
conv1 = tf.layers.conv1d(X_reshaped, filters=24, kernel_size=4,
strides=2, padding='same', activation=tf.nn.relu, name="conv1")
pool1 = tf.layers.average_pooling1d(conv1, pool_size=2, strides=2, padding='same')
conv2 = tf.layers.conv1d(pool1, filters=32, kernel_size=2,
strides=2, padding='same', activation=tf.nn.relu, name="conv2")
pool2 = tf.layers.average_pooling1d(conv2, pool_size=2, strides=2, padding='same')
flat = tf.layers.flatten(pool2, name='flatten')
drop = tf.layers.dropout(flat, rate=0.3, training=training)
output = tf.layers.dense(drop, n_outputs, activation=tf.nn.tanh, name="fully_connected")
with tf.name_scope("loss"):
loss = tf.reduce_mean(tf.square(y_reshaped - output))
initial_learning_rate = 0.01
decay_steps = 1000
decay_rate = 0.1
with tf.name_scope("train"):
global_step = tf.Variable(0, trainable=False, name="global_step")
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step, decay_steps, decay_rate)
optimizer = tf.train.RMSPropOptimizer(learning_rate, momentum=0.9)
training_op = optimizer.minimize(loss, global_step=global_step)
I scale the 40 input features to the range [0.0, 1.0]. In other words, my 'X' tensor contains the samples along the rows, with the columns being the various features. I scale each column to the range [0.0, 1.0].
Because I am using an output layer with one neuron with tanh activation (which has output in the range [-1.0: 1.0]:
- during training I scale the predictand (y) to the range [-1:0, 1.0]
- when using the trained network to generate predictions, I have to reverse the scaling to get "real" values (because the predicted values have range [-1.0, 1.0])
Is this approach correct?
The output from the network is almost identical for all test samples. Does this indicate there is a problem with the weights? I have tried setting "kernel_initializer='he_normal'" in the convolution layers but it didn't help.
When using a multilayer perceptron on this same dataset I needed to use batch normalisation otherwise the training would fail. Is there something similar for convolutional networks?