0
votes

I'm currently working with transfer learning on a sensor based activity dataset and I tried out two methods to transfer a model that was trained on another dataset before.

The first way of transfering was to load the trained model, cut off the last dense and softmax classification layer, add a new dense layer and softmax layer (that corresponds to the number of new classes), freeze every layer except the newly added and fit the model on the new dataset. This resulted in a F1-Score of 30%.

The second way to transfer the model was to initialize a new model, based on the new dataset, freeze every layer, except the last ones, transfer only the weights from the loaded model to the newly initialized and train the model. This resulted in a F1-Score of more or less 90%.

So right now, I'm trying to figure out what exactly is the difference between these two approaches to transfer a model. The second approach in the end is just a new model, where the weights have been initialized with already trained ones and not with weights that are coming from an initializer function (glorot_uniform, lecun_uniform, ...), right? For my understanding of transfer learning, this is also the correct approach. As for as I understood the concept, in transfer learning, you only reuse the weights and not the whole model.

Still I'm wondering what else influenced the training of the first approach so badly, that it resulted in only 30% F1?

thanks and best regards.

1
Pre-trained architectures are trained on imagenet dataset. If your data distribution is different, and you freeze all the layers, then surely your architecture will overfit. Load the model with imagenet weights and set it to trainable, such that the weights will be modified according to your dataset. Otherwise, set trainable last couple of layers that captures complex features. In TL, weights are reused but it's modified according to your dataset. The second model you said is it an architecture from ResNet/ InceptionResnet/Xception, etc.?Akash Kumar
Hi, thanks for your comment. I don't work with Images, I work with sensor based activity data. The architecture I use is a DeepConvLSTM based on mdpi.com/1424-8220/16/1/115/htm . This architecture consists of 3 Conv-Layers, followed by 2 LSTM-layers and a dense layer with softmax activation for classification. For the transferred model i followed the advice of Andrew Ng presented in this video youtube.com/watch?v=FQM13HkEfBk. He freezes all transferred layers and only set the last dense layer to be trainable.A.h.
Sensor based activity data means that I have a time discrete signal with x-,y- and z-axis recorded by an accelerometer, gyroscope and/or magnetometer. Depending on the used dataset you have 3, 6 or 9 channels.A.h.

1 Answers

0
votes

I just realized a few mistakes and I will post my code here in order to help others to not do the same mistake.

The F1-Score of 30% is actually the correct value after transferring my model. The reason why I got a F1-Score of more or less 90% was, because I trained my network again from scratch. What I did wrong during transferring was the following:

  1. I did not freeze the layers, instead I set the whole layer to False (which surprisingly did not throw an exception) (line 5)
  2. loaded_net stays undefined until the network initializing in line 2 gets executed. That means, that my prior trained weights are directly overwritten.
1            loaded_net = tensorflow.keras.models.clone_model(self.neural_network)
2            self.init_network()
3            for i in range(1, len(loaded_net.layers[:-1])):
4                self.neural_network.layers[i].set_weights(loaded_net.layers[i].get_weights())
5                self.neural_network.layers[i] = False
6            self.neural_network.compile(loss='mse', optimizer=self.optimizer,
7                                        metrics=(['accuracy', f1, precision, recall]))

Thanks and best regards,