The Azure speech-to-text outputs have a display
field in combinedRecognizedPhrases
. How can I map each word in the display
field to its timestamp in Azure speech-to-text output?
The Azure speech-to-text output contains word-level timestamps but only for the lexical
field in combinedRecognizedPhrases
.