2
votes

I am in need to Speech to text system so that I can transcribe audio files to text format. While researching on that I found systems created by big companies e.g Amazon Transcribe, Google Speech to Text, IBM Watson etc. And found all the libraries in python internal make use of those APIs.

What would be the steps if I want to create such a system myself? I could not find any detailed article on that. How to build your own system for speech recognition.

The main reason I want to create my own system is because I cannot send the audio files to external APIs due to security reasons.

The main goal is I have recordings of persons talking mostly in English language and I want to transcribe that audio to text.

Please let me know if you have any other ideas of doing the same instead of sending audio files to external systems.

1
Behind those services are extensivly trained speech recognition systems. You would have to aquire labeled speech samples and train your own system or somehow get a "trained" and canned code that does it. Asking for libs/outisde resources here is offtopic and your task is far too broad to be answered here. - Patrick Artner
Ya. I agree with your comment. Its very broad topic. I was thinking is there is some way I get the already trained model(open sourced etc) in my local and can try to transcrible audio using that model, if that's possible or if orgs released their models in public space. This question was to just get some idea if anyone has tried working on it before and can guide me in some right direction. - Kuldeep Singh

1 Answers

0
votes

One place to start would be to review the offerings of www.voxforge.org; review the tutorial and forums sections to get an overview of the use of open source projects such as Julius and CMU Sphinx. It's a quite extensive subject and you will find that many people have trodden the path before you, so you can learn from their experience.