I'm using a QnA service created in February this year. There are discrepancies between the test (QnA portal) & the published version (api). A correct answer would drop 10%, while a bad answer rises 10%, which ultimately converts good matches in test into bad ones in the bot application. Try to explain that to your customer.
It appears that you can run into this trouble if you use multiple KBs (= knowledge bases) on a single search service. The test index is a single index that covers all your KBs for that search service, while production KBs, when published, are indexed separately per KB. The QnA Maker help bot on the QnA portal mentions this:
"The top answer can sometimes vary because of small score variations between the test and production indexes. The test chat in portal hits the test index, and the generateAnswer API hits the production index. This typically happens when you have multiple knowledge bases in the same QnA Maker service. Learn more about confidence score differences.
This happens because all test knowledge bases are combined into a single index, while prod knowledge bases are on separate indexes. We can help you by separating all test and prod into separate indexes for your service."
So we need to contact Microsoft to also split up the test index per KB ? So that will rectify any discrepancies between test & published version ? Did not try this yet, anyone else?
Or do we limit ourselves to a single KB per search service (= multiple search services = expensive).
Or do we put all in a single KB, and use metadata to logically separate the answers and pray that this single massive KB produces good enough results ?