1
votes

I am using the node sdk to use the IBM watson speech-to-text module. After sending the audio sample and receiving a response, the confidence factor looks weird.

{
  "results": [
    {
       "word_alternatives": [
      {
      "start_time": 3.31,
      "alternatives": [
        {
          "confidence": 0.7563,
          "word": "you"
        },
        {
          "confidence": 0.0254,
          "word": "look"
        },
        {
          "confidence": 0.0142,
          "word": "Lou"
        },
        {
          "confidence": 0.0118,
          "word": "we"
        }
      ],
      "end_time": 3.43
    },
...

and

...
],
"alternatives": [
    {
      "word_confidence": [
        [
          "you",
          0.36485132893469713
        ],
...

and I am asking for recognition with this config:

 var params = {
    audio: fs.createReadStream(req.file.path),
    content_type: 'audio/wav',
    'interim_results': false,
    'word_confidence': true,
    'timestamps': true,
    'max_alternatives': 3,
    'continuous': true,
    'word_alternatives_threshold': 0.01,
    'smart_formatting': true
  };

Notice how the confidence factors for the word "you" is different in both places. Is one of these numbers something different? What is going on here?

1
What is the start_time of the second "you"? The one with confidence 0.36485132893469713lCapp

1 Answers

1
votes

John, confidence values coming in the "word_alternatives" are derived from confusion networks, and are at the word-level, while confidence values coming in the list of "alternatives" are computed over lattices, at the sentence level.

Confusion networks are derived from lattices, but contain a different representation of the hypothesis space, which explains why confidence values coming from one or the other could differ.

In this case the sentence contains only one word, that's why the difference is very visible.