AWS Transcribe: Unsupported audio format: matroska,webm

Question

Hye I am new to AWS. My application is to record audio and convert speech to text using AWS transcribe.

So I am recording audio from a web browser and I saved it to AWS S3. Later when I tried to use AWS transcribe on that audio file I am getting an error Unsupported audio format: matroska,webm . So anyone please help me to solve this issue. I am using Javascript. My code is


let blob = new Blob(chunks, {type: "audio/mp3" })
var s3 = new AWS.S3();

var params = {Bucket: 'xxx', Key: 'audio', Body: blob};
s3.upload(params, function(err, data) {
  console.log(err, data);
});

In the S3 bucket it is showing as mp3 only but when I am trying to transcribe I am getting error Unsupported audio format: matroska,webm. So kindly help me to solve this issue

Hi! Could you download the file and check the actual format using a music player? What is the file extension? — Jonny5
Hye Jonny thanks for your reply. I downloaded the audio and properties are still showing mp3 only — kakara vinay
@karakara vinay can you try using the command line approach? that should help you quickly identify what is wrong. This example should be very easy to follow. — Jonny5

Juned Ahsan Juned Ahsan · Accepted Answer · 2019-12-28T23:03:55

As the error says you are using a wrong input audio format type. AWS Transcribe supports the input types as mentioned in the official AWS Transcribe FAQ.

Amazon Transcribe supports both 16 kHz and 8kHz audio streams, and multiple audio encodings, including WAV, MP3, MP4 and FLAC.

You need to convert your audio file to one of the supported audio format before sending it to transcribe. You can try to do this using online tools or some SDK.

AWS Transcribe: Unsupported audio format: matroska,webm

1 Answers