0
votes

I am working on a inter-class and intra-class classification problem with one CNN such as first there is two classes Cat and Dog than in Cat there is a classification three different breeds of cats and in Dog there are 5 different breeds dogs.

I haven't tried the coding yet just working on feasibility if that works. My question is what will be the feasible design for this kind of problem. I am thinking to design for the training, first CNN-1 network that will differentiate cat and dog and gather the image data of all the training images. After the separation of cat and dog, CNN-2 and CNN-3 will train these images further for each breed of dog and cat. I am just not sure how the testing will work in this situation.

1
If the prediction of CNN-1 is incorrect, the following CNN-2, CNN-3 will also be incorrect. I am not sure that prediction of prediction gives any good results.oro777
Any possible solution for such scenario? any published work?Aadnan Farooq A

1 Answers

1
votes

I have approached a similar problem previously in Python. Hopefully this is helpful and you can come up with an alternative implementation in Matlab if that is what you are using.

After all was said and done, I landed on a single model for all predictions. For your purpose you could have one binary output for dog vs. cat, another multi-class output for the dog breeds, and another multi-class output for the cat breeds.

Using Tensorflow, I created a mask for the irrelevant classes. For example, if the image was of a cat, then all of the dog breeds are irrelevant and they should not impact model training for that example. This required a customized TF Dataset (that converted 0's to -1 for the mask) and a customized loss function that returned 0 error when the mask was present for that example.

Finally for the training process. Specific to your question, you will have to create custom accuracy functions that can handle the mask values how you want them to, but otherwise this part of the process should be standard. It was best practice to evenly spread out the classes among the training data but they can all be trained together.

If you google "Multi-Task Training" you can find additional resources for this problem.

Here are some code snips if you are interested:

For the customize TF dataset that masked irrelevant labels...

# Replace 0's with -1 for mask when there aren't any labels
def produce_mask(features):
    for filt, tensor in features.items():
        if "target" in filt:
            condition = tf.equal(tf.math.reduce_sum(tensor), 0)
            features[filt] = tf.where(condition, tf.ones_like(tensor) * -1, tensor)
    return features


def create_dataset(filepath, batch_size=10):
    ...

    # **** This is where the mask was applied to the dataset
    dataset = dataset.map(produce_mask, num_parallel_calls=cpu_count())

    ...

    return parsed_features

Custom loss function. I was using binary-crossentropy because my problem was multi-label. You will likely want to adapt this to categorical-crossentropy.

# Custom loss function
def masked_binary_crossentropy(y_true, y_pred):
    mask = backend.cast(backend.not_equal(y_true, -1), backend.floatx())
    return backend.binary_crossentropy(y_true * mask, y_pred * mask)

Then for the custom accuracy metrics. I was using top-k accuracy, you may need to modify for your purposes, but this will give you the general idea. When comparing this to the loss function, instead of converting all to 0, which would over-inflate the accuracy, this function filters those values out entirely. That works because the outputs are measured individually, so each output (binary, cat breed, dog breed) would have a different accuracy measure filtered only to the relevant examples.

backend is keras backend.

def top_5_acc(y_true, y_pred, k=5):
    mask = backend.cast(backend.not_equal(y_true, -1), tf.bool)
    mask = tf.math.reduce_any(mask, axis=1)
    masked_true = tf.boolean_mask(y_true, mask)
    masked_pred = tf.boolean_mask(y_pred, mask)
    return top_k_categorical_accuracy(masked_true, masked_pred, k)

Edit

No, in the scenario I described above there is only one model and it is trained with all of the data together. There are 3 outputs to the single model. The mask is a major part of this as it allows the network to only adjust weights that are relevant to the example. If the image was a cat, then the dog breed prediction does not result in loss.