The Azure speech-to-text outputs have a display field in combinedRecognizedPhrases. How can I map each word in the display field to its timestamp in Azure speech-to-text output?
The Azure speech-to-text output contains word-level timestamps but only for the lexical field in combinedRecognizedPhrases.