I implemented a CBIR with SIFT combined with other feature-based algorithms (with OpenCV and Python3), now I have to evaluate how the combination of them (i.e. SIFT/SURF, ORB/BRISK...) perform.
I found that I can use Precision |TP| / (|TP| + |FP|) and Recall |TP| / (|TP| + |FN|). I know that the TP is the correct positive, that FN is the relevant documents that are not returned and that the FP is the documents that are returned but are not relevant
I calculate my matches with BF and I presume that:
matches=bf.knnMatch(descriptor1, descriptor2, k=2)
are my TP+FP- the matches finded with ration test are my TP
How can I calculate my FN? Such as the matches that are relevant but not returned?
Note that I'm just formulating a hypothesis, so please correct me if I'm wrong.
I would like to have some help on the concrete implementation, such as where are these data in a concrete case of images matching.
In alternative can you please suggest me how to evaluate a CBIR system based on feature detection and description?