I have developed an application for streaming speech recognition in c++ using another API and IBM Watson Speech to Text service API.
In both these programs, I am using the same file which contains this audio
several tornadoes touch down as a line of severe thunderstorms swept through Colorado on Sunday
This file is 641,680 bytes in size and I am sending 100,000 bytes (max) chunks at a time to the Speech to text servers.
Now, with the other API I am able to have everything recognized as a whole. With the IBM Watson API I couldn't. Here is what I have done:
- Connect to IBM Watson web server (Speech to text API)
- Send start frame
{"action":"start","content-type":"audio/mulaw;rate=8000"}
- Send binary 100,000 bytes
- Send stop frame
{"action":"stop"}
- ...Repeat binary and stop until the last byte.
The IBM Watson Speech API could only recognize the chunks individually
e.g.
several tornadoes touch down
a line of severe thunder
swept through Colorado
Sunday
This seems to be the output of individual chunks and the words coming in between the chunk division (for eg here, "thunderstorm" is partially present in the end of a chunk and partially in the starting of the next chunk) are thus incorrectly recognized or dropped.
What am I doing wrong?
EDIT (I am using c++ with boost library for websocket interface)
//Do the websocket handshake
void IbmWebsocketSession::on_ssl_handshake(beast::error_code ec) {
auto mToken = mSttServiceObject->GetToken(); // Get the authentication token
//Complete the websocket handshake and call back the "send_start" function
mWebSocket.async_handshake_ex(mHost, mUrlEndpoint, [mToken](request_type& reqHead) {reqHead.insert(http::field::authorization,mToken);},
bind(&IbmWebsocketSession::send_start, shared_from_this(), placeholders::_1));
}
//Sent the start frame
void IbmWebsocketSession::send_start(beast::error_code ec) {
//Send the START_FRAME and call back the "read_resp" function to receive the "state: listening" message
mWebSocket.async_write(net::buffer(START_FRAME),
bind(&IbmWebsocketSession::read_resp, shared_from_this(), placeholders::_1, placeholders::_2));
}
//Sent the binary data
void IbmWebsocketSession::send_binary(beast::error_code ec) {
streamsize bytes_read = mFilestream.rdbuf()->sgetn(&chunk[0], chunk.size()); //gets the binary data chunks from a file (which is being written at run time
// Send binary data
if (bytes_read > mcMinsize) { //Minimum size defined by IBM is 100 bytes.
// If chunk size is greater than 100 bytes, then send the data and then callback "send_stop" function
mWebSocket.binary(true);
/**********************************************************************
* Wait a second before writing the next chunk.
**********************************************************************/
this_thread::sleep_for(chrono::seconds(1));
mWebSocket.async_write(net::buffer(&chunk[0], bytes_read),
bind(&IbmWebsocketSession::send_stop, shared_from_this(), placeholders::_1));
} else { //If chunk size is less than 100 bytes, then DO NOT send the data only call "send_stop" function
shared_from_this()->send_stop(ec);
}
}
void IbmWebsocketSession::send_stop(beast::error_code ec) {
mWebSocket.binary(false);
/*****************************************************************
* Send the Stop message
*****************************************************************/
mWebSocket.async_write(net::buffer(mTextStop),
bind(&IbmWebsocketSession::read_resp, shared_from_this(), placeholders::_1, placeholders::_2));
}
void IbmWebsocketSession::read_resp(beast::error_code ec, size_t bytes_transferred) {
boost::ignore_unused(bytes_transferred);
if(mWebSocket.is_open())
{
// Read the websocket response and call back the "display_buffer" function
mWebSocket.async_read(mBuffer, bind(&IbmWebsocketSession::display_buffer, shared_from_this(),placeholders::_1));
}
else
cerr << "Error: " << e->what() << endl;
}
void IbmWebsocketSession::display_buffer(beast::error_code ec) {
/*****************************************************************
* Get the buffer into stringstream
*****************************************************************/
msWebsocketResponse << beast::buffers(mBuffer.data());
mResponseTranscriptIBM = ParseTranscript(); //Parse the response transcript
mBuffer.consume(mBuffer.size()); //Clear the websocket buffer
if ("Listening" == mResponseTranscriptIBM && true != mSttServiceObject->IsGstFileWriteDone()) { // IsGstFileWriteDone -> checks if the user has stopped speaking
shared_from_this()->send_binary(ec);
} else {
shared_from_this()->close_websocket(ec, 0);
}
}