0
votes

I'm currently developing an application of Speech Recognition using Microsoft Kinect SDK. The goal of the application is to load any (valid) XML file containing the grammar and use it to process speech.

I'm in the process of testing the application by developing not so simple grammars and during that process I haven't figured out how to return specific semantic values, in particular names. For example, in the following grammar:

<grammar version="1.0" xml:lang="en-US" root="rootRule" tag-format="semantics/1.0-literals" xmlns="http://www.w3.org/2001/06/grammar">

<!-- Ask for person's related information (age, location, name, etc.) -->

<rule id="rootRule">
    <one-of>
        <!-- Ask person name -->
        <item>
            <tag>AnswerToNameQuestion</tag>
            <one-of>
                <item> my name is </item>
                <item> people call me </item>
            </one-of>
            <ruleref uri="#names"/>
        </item>

        <!-- Ask person location -->
        <item>
            <tag>QuestionOfLocation</tag>
            <one-of>
                <item> do you know where is </item>
                <item> can you tell me where  </item>
                <item> where did </item>
            </one-of>
            <ruleref uri="#names"/>
        </item>

    </one-of>
</rule>

<!-- Answer person name -->
<rule id="names">
    <item>
        <one-of>
            <item> peter </item>
            <item> john  </item>
            <item> danny </item>
        </one-of>
    </item>
</rule>

 </grammar>

When a person says their name, I want the semantic to be the name of that person. For example, for the question "What is your name" (not included in this grammar), and a reply "My name is -insert name-", I wanted the semantic results to be the name of the person and not simply "AnswerToNameQuestion" semantic result that is being returned right now. Any help would be greatly appreciated!

1
I cannot help you with a close answer, but note that speech recognition isn't actually a part of Kinect SDK but of Microsoft Speech Platform. Have a look at msdn.microsoft.com/en-us/library/hh378354.aspx. Maybe you can figure out how to configure speech event handlers to trigger some kind of "speech to text" after a certain speech recognized event.jmm

1 Answers

1
votes

In the SpeechRecognized event handler. You can get the "textual sentence" if you query the event.Result.Text property instead of event.Result.Semantics.Value. You can use this second property to remove from text string the non-relevant portion. For example, if somebody says "my name is Peter", inside SpeechRecognized event handler you will have:

event.Result.Text = "my name is Peter"

event.Result.Semantics.Value = "AnswerToNameQuestion"