I am trying to combine speech recognition and speaker diarization techniques to identify how many speakers are present in an conversation and which speaker said what.
For this I am using CMU Sphinx and LIUM Speaker Diarization.
I am able to run these two tools separately i.e. I can run Sphinx 4 and get text output from audio and run LIUM toolkit and get audio segments.
Now I want to combine these two and get output something like below :
s0 : this is my first sentence.
s1 : this is my reply.
s2: i do not what you are talking about
Does anyone knows how to combine these two toolkit?