How can I map each word in the `display` field to its timestamp in Azure speech-to-text output?

votes

The Azure speech-to-text outputs have a display field in combinedRecognizedPhrases. How can I map each word in the display field to its timestamp in Azure speech-to-text output?

The Azure speech-to-text output contains word-level timestamps but only for the lexical field in combinedRecognizedPhrases.

azurespeech-recognition