1
votes

I'm trying to use CMU Sphinx for speech recognition in java but the result I'm getting is not correct and I don't know why.

I have a .wav file I recorded with my voice saying some sentence in English.

Here is my code in java:

            Configuration configuration = new Configuration();

        // Set path to acoustic model.
        configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
        // Set path to dictionary.
        configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
        // Set language model.
        configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");
        StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);

        recognizer.startRecognition(new FileInputStream("assets/voice/some_wav_file.wav"));
        SpeechResult result = null;

        while ((result = recognizer.getResult()) != null) {
            System.out.println("~~ RESULTS: " + result.getHypothesis());
        }

        recognizer.stopRecognition();

    }
    catch(Exception e){
        System.out.println("ERROR: " + e.getMessage());
    }

I also have another code in Android that doesn't work as well:

Assets assets = new Assets(context);
                File assetDir = assets.syncAssets();
                String prefix = assetDir.getPath();

                Config c = Decoder.defaultConfig();
                c.setString("-hmm", prefix + "/en-us-ptm");
                c.setString("-lm", prefix + "/en-us.lm");
                c.setString("-dict", prefix + "/cmudict-en-us.dict");
                Decoder d = new Decoder(c);
                InputStream stream = context.getResources().openRawResource(R.raw.some_wav_file);


                d.startUtt();
                byte[] b = new byte[4096];
                try {
                    int nbytes;
                    while ((nbytes = stream.read(b)) >= 0) {
                        ByteBuffer bb = ByteBuffer.wrap(b, 0, nbytes);
                        short[] s = new short[nbytes/2];
                        bb.asShortBuffer().get(s);
                        d.processRaw(s, nbytes/2, false, false);
                    }
                } catch (IOException e) {
                    Log.d("ERROR: ", "Error when reading file" + e.getMessage());
                }
                d.endUtt();
                Log.d("TOTAL RESULT: ", d.hyp().getHypstr());
                for (Segment seg : d.seg()) {
                    Log.d("RESULT: ", seg.getWord());
                }

I used this website to convert the wav file to 16bit, 16khz, mono and little-endian (tried all the options of it).

Any ideas why is doesn't work. I use the built in dictionaries and accustic models and my accent in English is not perfect (don't know if it matters).

EDIT:

This is my file. I recorded myself saying: "My baby is cute" and that's what I expect to be the output. In the pure java code I get: "i've amy's youth" and in the android code I getl: " it"

Here is file containing the logs.

1
You need to explain what does not work exactly. Is the result not as expected or do you have crash or what. Share the file you are trying to recognize, share the expected result, share the result you get, share the application log. - Nikolay Shmyrev

1 Answers

0
votes

Your audio is somewhat corrupted by conversion. You should record into wav originally or into some other lossless format. Your pronunciation is also far from US English. For conversion between formats you can use sox instead of external website. Your android sample seems correct but it feels like you decode different file with android. You might check that you have actual proper file in resources.