Phonetic Speech Recognition

Question

I'm trying to get Latin Speech-Recognition for which I'll need, . . . not word-recognition but . . . phonetic-vowel-and-consonant-recognition (since Latin has only 40 sounds, but over 40,000 words x 60 avg. endings = 2.5 MILLION word-forms). The problem is, . . . both the Web Speech API and Google Cloud Speech only begin you with supposedly similar-sounding complete words (and from an English grammar, too, since there are no 2.5 Million-word Latin Grammars out there), and so there's no way for me to get down to processing the actual phonetic sounds, IN PARTICULAR JUST THE WORD-STEM (the first half of the word), which distinguishes each word, rather than the word-ending which uselessly (to me) tells how it's functioning in the sentence. Ideally, I'd want to have a grammar of word-stems such as

"am-" (short for amo,amare,amavi,amatus, etc.),
"vid-" (short for video,videre,vidi,visus, etc.),
"laet-" (short for laetus, laeta, laetum, etc.)
etc.

But speech-recognition technology can't search for that.
So where can I get phonetic speech recognition?

I prefer jS, pHp, or Node, and preferably client-side, rather than streaming.

Here's my code so far, for the Web Speech API. The key thing is the console.log()s which show my trying to dig into each returned possible-word's properties:

speech.onresult = function(event) { 
    var interim_transcript = '';
    var final_transcript = '';

    for (var i = event.resultIndex; i < event.results.length; ++i) { 
        if (event.results[i].isFinal) { 
            final_transcript += event.results[i][0].transcript;

            // This console.log shows all 3 word-guess possibilities.
               console.log(event.results[i]);
                    //These console.logs show each individual possibility:
                     //console.log('Poss-1:'); console.log(event.results[i][0]);
                     //console.log('Poss-2:'); console.log(event.results[i][1]);
                     //console.log('Poss-3:'); console.log(event.results[i][2]);
            for (var a in event.results[i]) {
                for (var b in event.results[i][a]) {
                  /*This black-&-yellow console.log below shows me trying to dig into
                  each returned possibility's PROPERTIES, but alas, the only 
                  returned properties are 
                  (1) the transcript (i.e. the guessed word), 
                  (2) the confidence (i.e. the 0-to-1 likelihood of it being that word)
                  (3) the prototype 
                   */
                    console.log("%c Poss-"+a+" %c "+b+": "+event.results[i][a][b], 'background-color: black; color: yellow; font-size: 14px;', 'background-color: black; color: red; font-size: 14px;'); 
                }        
            }

      } 
    }
    if (action == "start") {
        transcription.value += final_transcript;
        interim_span.innerHTML = interim_transcript;                       
    }
};

You can build the dictionary yourself. What is the expected transcript of "phonetic-vowel-and-consonant-recognition"? — guest271314
I can't build a dictionary of 2.5 million possible words. A word-tree-structure might work, but the available technologies aren't designed to recognize HALF a word (just the root, not the ending). — rudminda
"I can't build a dictionary of 2.5 million possible words." ? Why not? "but the available technologies aren't designed to recognize HALF a word (just the root, not the ending)." What do you mean by "recognize"? Again, you can create a grammar list yourself — guest271314
Isn't 2.5 million too many for the speech-recognizer to search thru within a quarter-second? By "recognize" I mean 'consider-as-a-possibility.' Speech-recognizers use the Levenshtein algorithm to rank word-candidates based on percentage-likelihood of being the sound they heard. But to rank word-ROOTS (again the 1st half of the word), to a dictionary of possible word-roots, they wouldn't know where to break the sound they heard. — rudminda
"within a quarter-second" How is time relevant to the inquiry at original Question? — guest271314

guest271314 guest271314 · Accepted Answer · 2017-11-05T00:36:06

You can use create a SpeechGrammarList. See also JSpeech Grammar Format.

Example description and code at MDN

The SpeechGrammarList interface of the Web Speech API represents a list of SpeechGrammar objects containing words or patterns of words that we want the recognition service to recognize.

Grammar is defined using JSpeech Grammar Format (JSGF.) Other formats may also be supported in the future.

var grammar = '#JSGF V1.0; grammar colors; public <color> = aqua | azure | beige | bisque | black | blue | brown | chocolate | coral | crimson | cyan | fuchsia | ghostwhite | gold | goldenrod | gray | green | indigo | ivory | khaki | lavender | lime | linen | magenta | maroon | moccasin | navy | olive | orange | orchid | peru | pink | plum | purple | red | salmon | sienna | silver | snow | tan | teal | thistle | tomato | turquoise | violet | white | yellow ;'
var recognition = new SpeechRecognition();
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognition.grammars = speechRecognitionList;

Phonetic Speech Recognition

1 Answers