1
votes

I'm able to run this piece of code and get transcriptions of audio files. But it does not distinguish between speakers (results always show "speaker 1", "speaker 2" is never recognised).

https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/batch/python/python-client/main.py

Example of files I've been using:

English: https://transcripttests.blob.core.windows.net/testfiles/pulpfiction.mp3

French: https://transcripttests.blob.core.windows.net/testfiles/dialogue50smono44100.wav

2

2 Answers

0
votes

Thanks for reporting the issue. I can confirm your issue, i.e. only 1 speaker is recognized for both audios. We are checking with our diarization scientist team to root cause this issue and will come back to you soon. Sorry for the inconvenience!

0
votes

Update:

We had a new release recently, and the first audio (English) one should produce 2 speakers. Please check and let us know if any issue.

As to the second audio (French), this one is more complicated, and our scientists are working on it. Will keep you updated. Thanks!


Original:

Thank you for your patience. We are aware and can repro the issue.

For the first audio(English), the 2 speakers are both male and very similar, that's one reason that our diarization service didn't differentiate them.

For the second audio(French), the are 3 utterances from the female speaker, #2 and #3 are very short, and #1 was happened to be split into 2 short utterances by our system, so non of them are treated as speaker 2.

Our scientists are actively working on this but no exact ETA so far. I will let you know once we have updates. Thanks!