2
votes

I'm using the Azure QnA Version 4. I'm posting using the REST API. If I'm posting against the Live-Database using the parameter isTest=true I'm getting an answer score of around 80% which is very reasonable as my question almost matches the database. I'm getting exactly the same result using the Webinterface on qnamaker.ai.

Using the same POST against the published version (without isTest=true) I'm getting a score of only around 13% (which is very odd for entering almost a question which matches the database). I've found some hints within the FAQs that slight differences are normal but I don't think 67% difference is normal. Is there anything I can do, so that the published version gets scores closer to the test version?

4
If you are satisfied with the results of your "Live-Database" version (that is to say test version), the next step is to publish this version docs.microsoft.com/en-us/azure/cognitive-services/qnamaker/…Nicolas R
This is exactly the point. My database is publishedM.A.
So if I understand well: you prepared your QnA KB, published it, and then without further changes on your KB you have different answer scores if you use testing vs production?Nicolas R
Yes. My QnA KB is published and without any changes on my KB i get different scores.M.A.
So that we might reproduce your problem, can you give us the specifics of your knowledge base and the question that gives 80% and 13%? If you don't want to reveal your project, can you create a test knowledge base where you can also reproduce the issue?Kyle Delaney

4 Answers

1
votes

Pursang has a good point on his answer. A good way to solve this problem is adding "isTest: true" on QnAMaker post request body. It has worked for me. Its a qnaMaker bug when we have to add multiples knowledge bases...

{"question":"your question here", "top":3, "isTest": true }

Good Luck!

0
votes

The test version and the published version are two different knowledge bases. This allows you to make changes and test them without affecting the live knowledge base that your customers are using. If you're getting worse results with your published knowledge base than your test version, that seems to indicate that you've trained your test knowledge base after you've published. Publishing again may fix the issue.

If you publish again and your published version still doesn't seem to behave the same as the test version, consider this entry in the FAQ:

The updates that I made to my knowledge base are not reflected on publish. Why not?

Every edit operation, whether in a table update, test, or settings, needs to be saved before it can be published. Be sure to click the Save and train button after every edit operation.

0
votes

I had the same exact problem. It was related to something going wrong when I created the QnA Service in Azzure. The Language of your QnA Knowldege Base is automatically detected. You can see your Language in your Azure Search Ressource=>testkb=>Fields=>question/awnser MSDN

Mine was set to Standard-Lucene instead of German-Microsoft. I did not find any way to change that, so I had to recreate the QnA Service and move all Knowledge Bases there. Example picture wrong language Example picture correct language

0
votes

I'm using a QnA service created in February this year. There are discrepancies between the test (QnA portal) & the published version (api). A correct answer would drop 10%, while a bad answer rises 10%, which ultimately converts good matches in test into bad ones in the bot application. Try to explain that to your customer.

It appears that you can run into this trouble if you use multiple KBs (= knowledge bases) on a single search service. The test index is a single index that covers all your KBs for that search service, while production KBs, when published, are indexed separately per KB. The QnA Maker help bot on the QnA portal mentions this:

"The top answer can sometimes vary because of small score variations between the test and production indexes. The test chat in portal hits the test index, and the generateAnswer API hits the production index. This typically happens when you have multiple knowledge bases in the same QnA Maker service. Learn more about confidence score differences.

This happens because all test knowledge bases are combined into a single index, while prod knowledge bases are on separate indexes. We can help you by separating all test and prod into separate indexes for your service."

So we need to contact Microsoft to also split up the test index per KB ? So that will rectify any discrepancies between test & published version ? Did not try this yet, anyone else?

Or do we limit ourselves to a single KB per search service (= multiple search services = expensive).

Or do we put all in a single KB, and use metadata to logically separate the answers and pray that this single massive KB produces good enough results ?