3
votes

I have a fully connected neural network with the following number of neurons in each layer [4, 20, 20, 20, ..., 1]. I am using TensorFlow and the 4 real-valued inputs correspond to a particular point in space and time, i.e. (x, y, z, t), and the 1 real-valued output corresponds to the temperature at that point. The loss function is just the mean square error between my predicted temperature and the actual temperature at that point in (x, y, z, t). I have a set of training data points with the following structure for their inputs:


(x,y,z,t):

(0.11,0.12,1.00,0.41)

(0.34,0.43,1.00,0.92)

(0.01,0.25,1.00,0.65)

...

(0.71,0.32,1.00,0.49)

(0.31,0.22,1.00,0.01)

(0.21,0.13,1.00,0.71)


Namely, what you will notice is that the training data all have the same redundant value in z, but x, y, and t are generally not redundant. Yet what I find is my neural network cannot train on this data due to the redundancy. In particular, every time I start training the neural network, it appears to fail and the loss function becomes nan. But, if I change the structure of the neural network such that the number of neurons in each layer is [3, 20, 20, 20, ..., 1], i.e. now data points only correspond to an input of (x, y, t), everything works perfectly and training is all right. But is there any way to overcome this problem? (Note: it occurs whether any of the variables are identical, e.g. either x, y, or t could be redundant and cause this error.) I have also attempted different activation functions (e.g. ReLU) and varying the number of layers and neurons in the network, but these changes do not resolve the problem.

My question: is there any way to still train the neural network while keeping the redundant z as an input? It just so happens the particular training data set I am considering at the moment has all z redundant, but in general, I will have data coming from different z in the future. Therefore, a way to ensure the neural network can robustly handle inputs at the present moment is sought.

A minimal working example is encoded below. When running this example, the loss output is nan, but if you simply uncomment the x_z in line 12 to ensure there is now variation in x_z, then there is no longer any problem. But this is not a solution since the goal is to use the original x_z with all constant values.

import numpy as np 
import tensorflow as tf

end_it = 10000 #number of iterations
frac_train = 1.0 #randomly sampled fraction of data to create training set
frac_sample_train = 0.1 #randomly sampled fraction of data from training set to train in batches
layers = [4, 20, 20, 20, 20, 20, 20, 20, 20, 1]
len_data = 10000
x_x = np.array([np.linspace(0.,1.,len_data)])
x_y = np.array([np.linspace(0.,1.,len_data)])
x_z = np.array([np.ones(len_data)*1.0])
#x_z = np.array([np.linspace(0.,1.,len_data)])
x_t = np.array([np.linspace(0.,1.,len_data)])
y_true = np.array([np.linspace(-1.,1.,len_data)])

N_train = int(frac_train*len_data)
idx = np.random.choice(len_data, N_train, replace=False)

x_train = x_x.T[idx,:]
y_train = x_y.T[idx,:]
z_train = x_z.T[idx,:]
t_train = x_t.T[idx,:]
v1_train = y_true.T[idx,:] 

sample_batch_size = int(frac_sample_train*N_train)

np.random.seed(1234)
tf.set_random_seed(1234)
import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)
tf.logging.set_verbosity(tf.logging.ERROR)

class NeuralNet:
    def __init__(self, x, y, z, t, v1, layers):
        X = np.concatenate([x, y, z, t], 1)  
        self.lb = X.min(0)
        self.ub = X.max(0)
        self.X = X
        self.x = X[:,0:1]
        self.y = X[:,1:2]
        self.z = X[:,2:3]
        self.t = X[:,3:4]
        self.v1 = v1 
        self.layers = layers 
        self.weights, self.biases = self.initialize_NN(layers) 
        self.sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=False,
                                                     log_device_placement=False)) 
        self.x_tf = tf.placeholder(tf.float32, shape=[None, self.x.shape[1]])
        self.y_tf = tf.placeholder(tf.float32, shape=[None, self.y.shape[1]])
        self.z_tf = tf.placeholder(tf.float32, shape=[None, self.z.shape[1]])
        self.t_tf = tf.placeholder(tf.float32, shape=[None, self.t.shape[1]])
        self.v1_tf = tf.placeholder(tf.float32, shape=[None, self.v1.shape[1]])  
        self.v1_pred = self.net(self.x_tf, self.y_tf, self.z_tf, self.t_tf) 
        self.loss = tf.reduce_mean(tf.square(self.v1_tf - self.v1_pred)) 
        self.optimizer = tf.contrib.opt.ScipyOptimizerInterface(self.loss,
                                                                method = 'L-BFGS-B',
                                                                options = {'maxiter': 50,
                                                                           'maxfun': 50000,
                                                                           'maxcor': 50,
                                                                           'maxls': 50,
                                                                           'ftol' : 1.0 * np.finfo(float).eps})
        init = tf.global_variables_initializer()  
        self.sess.run(init)
    def initialize_NN(self, layers):
        weights = []
        biases = []
        num_layers = len(layers)
        for l in range(0,num_layers-1):
            W = self.xavier_init(size=[layers[l], layers[l+1]])
            b = tf.Variable(tf.zeros([1,layers[l+1]], dtype=tf.float32), dtype=tf.float32)
            weights.append(W)
            biases.append(b) 
        return weights, biases
    def xavier_init(self, size):
        in_dim = size[0]
        out_dim = size[1]
        xavier_stddev = np.sqrt(2/(in_dim + out_dim)) 
        return tf.Variable(tf.truncated_normal([in_dim, out_dim], stddev=xavier_stddev), dtype=tf.float32)
    def neural_net(self, X, weights, biases):
        num_layers = len(weights) + 1
        H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0
        for l in range(0,num_layers-2):
            W = weights[l]
            b = biases[l]
            H = tf.tanh(tf.add(tf.matmul(H, W), b))
        W = weights[-1]
        b = biases[-1]
        Y = tf.add(tf.matmul(H, W), b) 
        return Y
    def net(self, x, y, z, t): 
        v1_out = self.neural_net(tf.concat([x,y,z,t], 1), self.weights, self.biases)
        v1 = v1_out[:,0:1]
        return v1
    def callback(self, loss):
        global Nfeval
        print(str(Nfeval)+' - Loss in loop: %.3e' % (loss))
        Nfeval += 1
    def fetch_minibatch(self, x_in, y_in, z_in, t_in, den_in, N_train_sample):  
        idx_batch = np.random.choice(len(x_in), N_train_sample, replace=False)
        x_batch = x_in[idx_batch,:]
        y_batch = y_in[idx_batch,:]
        z_batch = z_in[idx_batch,:]
        t_batch = t_in[idx_batch,:]
        v1_batch = den_in[idx_batch,:] 
        return x_batch, y_batch, z_batch, t_batch, v1_batch
    def train(self, end_it):  
        it = 0
        while it < end_it: 
            x_res_batch, y_res_batch, z_res_batch, t_res_batch, v1_res_batch = self.fetch_minibatch(self.x, self.y, self.z, self.t, self.v1, sample_batch_size) # Fetch residual mini-batch
            tf_dict = {self.x_tf: x_res_batch, self.y_tf: y_res_batch, self.z_tf: z_res_batch, self.t_tf: t_res_batch,
                       self.v1_tf: v1_res_batch}
            self.optimizer.minimize(self.sess,
                                    feed_dict = tf_dict,
                                    fetches = [self.loss],
                                    loss_callback = self.callback) 
    def predict(self, x_star, y_star, z_star, t_star): 
        tf_dict = {self.x_tf: x_star, self.y_tf: y_star, self.z_tf: z_star, self.t_tf: t_star}
        v1_star = self.sess.run(self.v1_pred, tf_dict)  
        return v1_star

model = NeuralNet(x_train, y_train, z_train, t_train, v1_train, layers)

Nfeval = 1
model.train(end_it)
2
Having a variable that always takes the same value shouldn't be a big issue, I have trained multiple models on datasets like that. Are you normalizing your input? It seems very possible that those come from that, as the mean for z would be 1 and the standard deviation would be 0, so the normalized values would all be (1 - 1) / 0 = nan. To avoid that (values with null standard deviation), when I normalize the data I put a safeguard like x_std = np.std(x, axis=0); x_std[x_std < 1e-6] = 1.jdehesa
@jdehesa The values of z were all normalized to be of order unity in a previous preprocessing step, but this problem persists even if z != 1 (but they are nevertheless all still a single constant, i.e. z[i] = z[j]). The standard deviation of the input variable z is therefore necessarily 0 since they are all the same. Further clarity and expanding upon what exactly you are suggesting and how it will help would be great.Mathews24
The problem I mention happens in any case where the variable always take the same value, doesn't matter which value exactly. In those cases, the result of the normalisation is always 0/0, which is nan. You can avoid the issue by replacing standard deviations of 0 (or very close to zero) by some constant, e.g. 1, then the normalisation will be 0/1 which is zero, and the feature will have no effect.jdehesa
@jdehesa Yes, but where exactly are you proposing to replace standard deviations? In the updating of weights in the neural network itself? Or elsewhere? A code example implementing your solution (e.g. in TensorFlow) would be helpful.Mathews24
@jdehesa I have now included a minimal working example. If you can indicate (e.g. with an example) how to train against this data and how you've overcome the problem of training against data with a constant input variable in the past, glad to work through it.Mathews24

2 Answers

1
votes

I think your problem is in this line:

H = 2.0*(X - self.lb)/(self.ub - self.lb) - 1.0

In the third column fo X, corresponding to the z variable, both self.lb and self.ub are the same value, and equal to the value in the example, in this case 1, so it is acutally computing:

2.0*(1.0 - 1.0)/(1.0 - 1.0) - 1.0 = 2.0*0.0/0.0 - 1.0

Which is nan. You can work around the issue in a few different ways, a simple option is to simply do:

# Avoids dividing by zero
X_d = tf.math.maximum(self.ub - self.lb, 1e-6)
H = 2.0*(X - self.lb)/X_d - 1.0
0
votes

This is an interesting situation. A quick check on an online tool for regression shows that even simple regression suffers from the problem of unable to fit data points when one of the inputs is constant through the dataset. Taking a look at the algebraic solution for a two-variable linear regression problem shows the solution involving division by standard deviation which, being zero in a constant set, is a problem.

As far as solving through backprop is concerned (as is the case in your neural network), I strongly suspect that the derivative of the loss with respect to the input (these expressions) is the culprit, and that the algorithm is not able to update the weights W using W := W - α.dZ, and ends up remaining constant.