1
votes

Trying to train a CNN with a bunch of images using a DataGenerator class, model works perfectly fine normally. The problem is the training dataset is very skewed to a few classes so I want to add class_weights. However, every time I do this I get an index error in the part of the code that converts my labelled classes into one-hot arrays.

This if for Keras running on top of tensorflow. The function that is having the problem is keras.utils.to_categorical()

Here's the to catagorical function:

for i, pdb_id in enumerate(list_enzymes_temp):
    mat = precomputed_distance_matrix(pdb_id, self.dim)

    X[i,] = mat.distance_matrix.reshape(*self.dim)

    y[i] = int(self.labels[pdb_id.upper()][1]) - 1

    return X, keras.utils.to_categorical(y, num_classes=self.n_classes)

Here's the function I am using to generate the weights

def get_class_weights(dictionary, training_enzymes, mode):
    'Gets class weights for Keras'
    # Initialization
    counter = [0 for i in range(6)]

    # Count classes
    for enzyme in training_enzymes:
        counter[int(dictionary[enzyme.upper()][1])-1] += 1
    majority = max(counter)

    # Make dictionary
    class_weights = {i: float(majority/count) for i, count in enumerate(counter)}

    # Value according to mode
    if mode == 'unbalanced':
        for key in class_weights:
            class_weights[key] = 1
    elif mode == 'balanced':
        pass
    elif mode == 'mean_1_balanced':
        for key in class_weights:
            class_weights[key] = (1+class_weights[key])/2

    return class_weights

and my fit_generator function:

model.fit_generator(generator=training_generator,
                validation_data=validation_generator,
                epochs=max_epochs,
                max_queue_size=16,
                class_weight=class_weights,
                callbacks=[tensorboard])

Heres the IndexError message does not appear and model works perfectly without the class_weights added:

File "C:\Users\Python\DMCNN\data_generator.py", line 73, in __getitem__
X, y = self.__data_generation(list_enzymes_temp)
File "C:\Users\Python\DMCNN\data_generator.py", line 59, in __data_generation
return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
File "C:\Users\Python\Anaconda3\lib\site-packages\keras\utils\np_utils.py", line 34, in to_categorical
categorical[np.arange(n), y] = 1
IndexError: index 1065353216 is out of bounds for axis 1 with size 6
1

1 Answers

0
votes

I had the same error while using keras.utils.to_categorical. The error I got is "IndexError: index 1065353216 is out of bounds for axis 1 with size 2" because I had 2 classes.

I believed it is from converting 1.0 to 1.0f (32 bit float) because 1065353216 is the unsigned 32-bit integer representation of the 32-bit floating point value 1.0 (check here: Why is 1.0f in C code represented as 1065353216 in the generated assembly?). In my case not all the batches have the same length, which ends up with some empty unfilled in X and y, which causes the issue. You can check if there are some elements unfilled in your W (or even in X and Y) in advance. You can also see that keras.utils.to_categorical has the default value dtype='float32'. You can try to specify dtype e.g. "return X, keras.utils.to_categorical(y, num_classes=self.n_classes, dtype='uint8')" in your case to see if it works.