12
votes

I am using the System.Speech.Recognition namespace to recognize a spoken sentence. I am interested in the alternative sentences the recognizer provides, alongside with their confidence scores. From the documentation for the [RecognitionResult.Alternates][1] property:

Recognition Alternates are ordered by the values of their Confidence properties. The confidence value of a given phrase indicates the probability that the phrase matches the input. The phrase with the highest confidence value is the phrase that most likely matches the input.

Each Confidence value should be evaluated individually and without reference to the confidence values of other Alternates.

However, when I print the recognized text with its confidence, and also the alternative matches with their confidence, I face two properties which I fail to understand: First, the alternatives are not ordered according to confidence (although the first one does match the recognized text), and second, which is a bigger problem for me, the recognized text is not the alternative with the highest score, which seems to contradict the documentation I quoted above.

My (incomplete) code sample from within the SpeechRecognized event handler:

Console.WriteLine("Recognized text =  {0}, score = {1}", e.Result.Text, e.Result.Confidence); 
// Display the recognition alternates for the result.
foreach (RecognizedPhrase phrase in e.Result.Alternates)
{
    Console.WriteLine(" alt({0}) {1}", phrase.Confidence, phrase.Text);
}

and the corresponding output:

Recognized text =  She had said that fit and Gracie Wachtel are all year, score = 0.287724
alt(0.287724) She had said that fit and Gracie Wachtel are all year
alt(0.287724) she had said that fit and gracie wachtel are all year
alt(0.2955212) she had said that faith and gracie wachtel are all year
alt(0.287133) she had said that fit and gracie Wachtell are all year
alt(0.1644379) she had said that fit and gracie wachtel earlier
alt(0.3254312) jihad said that fit and gracie wachtel are all year
alt(0.2726361) she had said that fit and gracie wachtel are only are
alt(0.2867217) she had said that fail and gracie wachtel are all year
alt(0.2565451) she had said that fit and gracie watchful are all year
alt(0.2854537) she had said that fate and gracie wachtel are all year

EDIT To clarify the meaning of the confidence score, and to make the point of why my results contradict the documentation, see the following info from the documentation of RecognizedPhrase.Confidence Property. The bold parts are my addition:

Confidence scores do not indicate the absolute likelihood that a phrase was recognized correctly. Instead, confidence scores provide a mechanism for comparing the relative accuracy of multiple recognition alternates for a given input. This facilitates returning the most accurate recognition result. For example, if a recognized phrase has a confidence score of 0.8, this does not mean that the phrase has an 80% chance of being the correct match for the input. It means that the phrase is more likely to be the correct match for the input than other results that have confidence scores less than 0.8.

A confidence score on its own is not meaningful unless you have alternative results to compare against, either from the same recognition operation or from previous recognitions of the same input. The values are used to rank alternative candidate phrases returned by the Alternates property on RecognitionResult objects.

Confidence values are relative and unique to each recognition engine. Confidence values returned by two different recognition engines cannot be meaningfully compared.

A speech recognition engine may assign a low confidence score to spoken input for various reasons, including background interference, inarticulate speech, or unanticipated words or word sequences. If your application is using a SpeechRecognitionEngine instance, you can modify the confidence level at which speech input is accepted or rejected with one of the UpdateRecognizerSetting methods. Confidence thresholds for the shared recognizer, managed by SpeechRecognizer, are associated with a user profile and stored in the Windows registry. Applications should not write changes to the registry for the properties of the shared recognizer.

The Alternates property of the RecognitionResult object contains an ordered collection of RecognizedPhrase objects, each of which is a possible match for the input to the recognizer. The alternates are ordered from highest to lowest confidence.

2

2 Answers

2
votes

I can only give you a generic answer (I do not know the code of the Microsoft speech recognition) Recognition use many algorithms to approach the best solution. In a perfect world, each algorithm should be able to weight the confidence score of the sentence converted. In fact it is almost never the case:

Each algorithm is flawed and giving its exact impact on confidence in the conversion can be a real headache.

The global sentence confidence is an arithmetical combination of each part of it. Generally by far simpler than the internal confidence schema.

Some algorithms used, like proper nouns recognition do not necessarily clearly change the confidence (in particular in a single isolated sentence).

The confidence is measured at many levels (voice, words, sentence structure ...) What will be the confidence of a perfect voice recognition with an inconsistent sentence structure ?

The sorting algorithms moving the better recognition at the top of the list do not generally change the confidence but only sort/exclude alternates.

So the documentation is right, confidences cannot be compared between alternates.

What is the potential usage of confidence (except the fact the authors want to say us : we can give you an easy usage of a very complex and approximate technology). Nearly none. You possibly can eliminate too low confidence levels (below a certain threshold) except when no confidence reach this threshold.

2
votes

Confidence property here is the "likelihood" output value from the internal model (Usually the language modeling is done composing "artificial intelligence mathematical models" like Hidden Markov Models or MFCC).

BUT the Speech Recognition SDK presents you a list based on a different confidence based on take the models output and verify other additional parameters. For this SDK the compliance with a well formed rule in its grammar.