0
votes

The Azure speech-to-text outputs have a display field in combinedRecognizedPhrases. How can I map each word in the display field to its timestamp in Azure speech-to-text output?

The Azure speech-to-text output contains word-level timestamps but only for the lexical field in combinedRecognizedPhrases.