1
votes

I am using the Google Speech API from cloud platform for getting speech-to-text of a streaming audio. I have already done the REST API calls using curl POST requests for a short audio file using GCP.

I have seen the documentation of the Google Streaming Recognize, which says "Streaming speech recognition is available via gRPC only."

I have gRPC (also protobuf) installed in my OpenSuse Leap 15.0. Here is the screenshot of the directory.

Directory

Next I am trying to run the streaming_transcribe example from this link, and I found that the sample program uses a local file as the input but simulate it as a microphone input (catching 64K chunks sequentially) and then send the data to Google server.

For initial tests to check the grpc is correctly set on my system I ran make run_tests. I have changed the Makefile as:

...
...Some text as original Makefile
...
.PHONY: all
all: streaming_transcribe
googleapis.ar: $(GOOGLEAPIS_CCS:.cc=.o) 
      ar r $@ $?
streaming_transcribe: streaming_transcribe.o parse_arguments.o googleapis.ar
      $(CXX) $^ $(LDFLAGS) -o $@
run_tests:
      ./streaming_transcribe -b 16000 resources/audio.raw
      ./streaming_transcribe --bitrate 16000 resources/audio2.raw
      ./streaming_transcribe resources/audio.flac
      ./streaming_transcribe resources/quit.raw
clean: rm -f *.o streaming_transcribe \
       googleapis.ar \
       $(GOOGLEAPIS_CCS:.cc=.o)

This do not work well (neither does the orignal Makefile). But the streaming_transcribe.o file is created after running the Makefile. So I manually ran the file and got the following responses

Screenshot2

Any suggestions on how to run the test and use gstreamer instead of the function used for simulating the mic-phone audio?

2
You can't execute plain object files. They need to be linked (which is what the line starting with sreaming_transcribe: is for) - but make is picky. The $(CXX) $^ $(LDFLAGS) -o $@ line for linking streaming_transcribe must start with a tab character. - Ted Lyngmo
Hi @TedLyngmo The suggestions given above are already taken care of. But I am still getting the error. the LDFLAGS += -L/usr/local/lib 'pkg-config --libs grpc++ grpc' -Wl,--no-as-needed -lgrpc++_reflection -Wl,--no-as-needed -lprotobuf -lpthread -ldl - RC0993
Ok, then update your question and include the exact error you get (as text, not as an image). - Ted Lyngmo
I am trying to re-install everything. I will let you know the output - RC0993
Reinstalling from scratch really solved the problem. Apparently the shared objects were not set up correctly which stopped the the make run_tests to go through. Running *.o file was not correct anyway. The Makefile produces the executable which needs to be run. How do I mark this issue close? - RC0993

2 Answers

1
votes

how to run the test

Follow the instructions on cpp-docs-samples. Prerequisit - Install grpc, protobuf, and googleapis and setup the environment as said in the links above.

gstreamer instead of the function used for simulating the mic-phone audio

For this program I have created pipelines which are

gst-launch-1.0 filesrc location=/path/to/file/FOO.wav ! wavparse ! audioconvert ! audio/x-raw,channels=1,depth=16,width=16,rate=44100 ! rtpL16pay  ! udpsink host=xxx.xxx.xxx.xxx port=yyyy

The audio file can be changed to flac or mp3 with changing appropriate elemnets in pipeline

gst-launch-1.0 udpsrc port=yyyy ! "application/x-rtp,media=(string)audio, clock-rate=(int)44100, width=16, height=16, encoding-name=(string)L16, encoding-params=(string)1, channels=(int)1, channel-positions=(int)1, payload=(int)96" ! rtpL16depay ! audioconvert ! audio/x-raw,format=S16LE ! filesink location=/path/to/where/you/want/to/dump/the/rtp/payloads/ABC.raw

The process of taking payloads from rtp stream and writing it on file is done in another thread than sending the data to google and reading the response.

0
votes

maybe a dedicated soundcard can listen to rtsp stream? with

try (SpeechClient speechClient = SpeechClient.create

RecognitionConfig config =
    RecognitionConfig.newBuilder()
        .setEncoding(AudioEncoding.LINEAR16)
        .setLanguageCode("en-US")
        .setSampleRateHertz(44100)
        .setAudioChannelCount(2)
        .setEnableSeparateRecognitionPerChannel(true)
        .build();