0
votes

I want to use a 1D convolutional neural network for regression.

I have about 1500 training samples, each having 40 features. I am training in batches of around 200-300 samples.

I am not sure if I have the code set up correctly. Each input sample is essentially a 1D vector with 40 elements, so in the first convolution layer I want each filter to pass along the length of each vector (independently) in the training batch. Have I set it up the width, height, channels etc. correctly to achieve this?

My code is:

width = 40
channels = 1
n_outputs = 1

X = tf.placeholder(tf.float32, shape=[None, width], name="X")
X_reshaped = tf.reshape(X, shape=[-1, width, channels])
y = tf.placeholder(tf.float32, shape=[None, channels], name="y")
y_reshaped = tf.reshape(y, shape=[-1, channels])
training = tf.placeholder_with_default(False, shape=(), name='training')

with tf.name_scope("cnn"):
    conv1 = tf.layers.conv1d(X_reshaped, filters=24, kernel_size=4,
                             strides=2, padding='same', activation=tf.nn.relu, name="conv1")

    pool1 = tf.layers.average_pooling1d(conv1, pool_size=2, strides=2, padding='same')

    conv2 = tf.layers.conv1d(pool1, filters=32, kernel_size=2,
                             strides=2, padding='same', activation=tf.nn.relu, name="conv2")

    pool2 = tf.layers.average_pooling1d(conv2, pool_size=2, strides=2, padding='same')

    flat = tf.layers.flatten(pool2, name='flatten')

    drop = tf.layers.dropout(flat, rate=0.3, training=training)

    output = tf.layers.dense(drop, n_outputs, activation=tf.nn.tanh, name="fully_connected")

with tf.name_scope("loss"):
    loss = tf.reduce_mean(tf.square(y_reshaped - output))

initial_learning_rate = 0.01
decay_steps = 1000
decay_rate = 0.1 

with tf.name_scope("train"):
   global_step = tf.Variable(0, trainable=False, name="global_step")
    learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step, decay_steps, decay_rate)
    optimizer = tf.train.RMSPropOptimizer(learning_rate, momentum=0.9)
    training_op = optimizer.minimize(loss, global_step=global_step)

I scale the 40 input features to the range [0.0, 1.0]. In other words, my 'X' tensor contains the samples along the rows, with the columns being the various features. I scale each column to the range [0.0, 1.0].

Because I am using an output layer with one neuron with tanh activation (which has output in the range [-1.0: 1.0]:

  • during training I scale the predictand (y) to the range [-1:0, 1.0]
  • when using the trained network to generate predictions, I have to reverse the scaling to get "real" values (because the predicted values have range [-1.0, 1.0])

Is this approach correct?

The output from the network is almost identical for all test samples. Does this indicate there is a problem with the weights? I have tried setting "kernel_initializer='he_normal'" in the convolution layers but it didn't help.

When using a multilayer perceptron on this same dataset I needed to use batch normalisation otherwise the training would fail. Is there something similar for convolutional networks?

1

1 Answers

0
votes

Using an activation function at the end of the output layer in a regression problem is not a common practice. Activations are used in classification problems and even then, sigmoid(for binary classification) or softmax(for multi-class classification) is generally preferred.

I see that you are using MSE loss which is the right choice for regression but your scaling of input and output is not in accordance with the assumption of neural networks(input and output comes from a normal distribution: mean:0, std:1) so people generally do z-normalization(subtract mean and divide by the std) on inputs and outputs before training.