6
votes

I would like to have a custom skill, but it would need direct access to the users voice (our output of a recorded audio). Can/will Alexa relay the stream rather than sending the request invocations (launch/intent/session-end)?

I understand custom skills can send back mp3s as responses, but being able to gain access to the actual voice requests, either the stream or a mp3, would be awesome.

Edit:

It seems that there is not a provided mp3 in the request object: https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interface-reference#LaunchRequest

2

2 Answers

16
votes

Alexa does not provide this service.

Having an always-on device in a domestic setting, that can hear everything said, plus background noise, and side conversations, is a huge security concern. Amazon mitigates this concern by filtering the input, performing the difficult Speech-to-text work, and only providing the resulting text. (After further processing by your interaction model.)

-1
votes

In short, no - I can't find anywhere specifically in the documentation but I just created a Python library that encapsulates all the JSON structures, so I know you can't do this yet.

The only control over audio is 'output' through embedding links in SSML.

https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/handling-requests-sent-by-alexa#Including%20Pre-Recorded%20Audio%20in%20your%20Response