I am in need to Speech to text system so that I can transcribe audio files to text format. While researching on that I found systems created by big companies e.g Amazon Transcribe, Google Speech to Text, IBM Watson etc. And found all the libraries in python internal make use of those APIs.
What would be the steps if I want to create such a system myself? I could not find any detailed article on that. How to build your own system for speech recognition.
The main reason I want to create my own system is because I cannot send the audio files to external APIs due to security reasons.
The main goal is I have recordings of persons talking mostly in English language and I want to transcribe that audio to text.
Please let me know if you have any other ideas of doing the same instead of sending audio files to external systems.