I am trying to build a program that can correctly and confidently identify an object with Google Cloud Vision (denoted as GCV henceforth). The results returned are correct most of the times with a certain accuracy score for each label, as such.
{
"banana": "0.92345",
"yellow": "0.91002",
"minion": "0.89921",
}
The environment I am working with has diverse set of lightning condition and objects are expected to placed in random position. When an object with different position is placed, the results returned from GCV will be slightly different because a different image is queried. For example,
{
"banana": "0.82345",
"lemon": "0.82211",
"yellow": "0.81102",
"minion": "0.79921",
}
My program is designed in a way that, when object banana
is detected with accuracy greater than certain value, then next action will be dispatched.
There are 3 clusters of object types. For instance, banana
goes to container A
, apple
goes to container B
and orange
goes to container C
.
When I present my work to my professor, he questioned that how can I confidently define and validate the threshold value for each item, as respected to its respective cluster.
I tried to obtain a mean score of banana by training hundreds of banana images but eventually I found that this is probably not the correct way of defining a threshold. My professor suggested to use K Nearest Neighbour to find similarity of those images but isn't that already a part of GCV? Even if what he suggested is correct, what is the correct approach to train a post GCV classifier, with the limited data returned from GCV?