
Hello everybody,

My objective is to detect people and cars (day and night) on images of the size of 1920x1080, for this I use the tensorflow API, I use a SSD mobilenet model, I annotated 1000 images (900 for training, 100 for evaluation) from 7 different cameras. I launch the training with an image size of 960x540. My model does not converge. I do not know what to do, should I make different classes for day and night objects?

On a tutorial for face detection with the tensorflow API, they use a dataset with images containing only faces, then use the model on complex scenes. Is this a good idea knowing that a model like SSD also learns negative examples?

Thank you

(sources: https://blog.usejournal.com/face-detection-for-cctv-surveillance-6b8851ca3751)


1 Answers


What do you mean by "not converge"? Are you referring to the train/validation loss?
In this case, the first thing that comes to my mind is to reduce the learning rate (I had a similar problem). You can do it by modifying you configuration file, in the "train_config" section you'll find the value "initial_learning_rate".
Try to set it up to a lower value (like, an order of magnitude lower) and see if it helps.