So I've been messing around with tensorflow's object detection api and specifically the re-training of models, essentially doing this. I made it detect my object fairly well with a small number of images. But I wanted to increase the number of images I train with, however the labeling process is long and boring so I found a data set with cropped images, so only my object is in the image.
If there's a way to send whole images without labeling them too be trained using tensorflow api I didn't find it but I thought making a program that labels the whole image would not be that hard.
The format of the labeling is a csv file with these entries: filename, width, height, class, xmin, ymin, xmax, ymax.
This is my code:
import os
import cv2
path = "D:/path/to/image/folder"
directory = os.fsencode(path)
text = open("D:/result/train.txt","w")
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".jpg"):
impath= path + "/" + filename
img = cv2.imread(impath)
res = filename+","+ str(img.shape[1])+","+str(img.shape[0])+",person,1,1,"+str(img.shape[1]-1) +"," +str(img.shape[0]-1)+"\n"
text.write(res)
print(res)
text.close()
This seems to be working fine.
Now here's the problem. After converting the .txt to .csv and running the training until the loss stops decreasing my detection on my test set are awful. It puts a huge bounding box around the entirety of the image like it's trained to detect only the edges of the image.
I figure it's somehow learning to detect the edges of the images since the labeling is around the whole image. But how do I make it learn to "see" what's in the picture? Any help would be appreciated.
xmin, ymin, xmax, ymax
to1,1, img.shape[1]-1, img.shape[0]-1
? – keineahnung2345(img.shape[1]-1, img.shape[0]-1)
is the top right. – John Slaine