im trying to use my python app to transcribe multiple files in a folder and speed up the process. At present I am able to do it one file at a time -
####RUN THIS PART FIRST#########
import json
from os.path import join, dirname
from ibm_watson import SpeechToTextV1
from ibm_watson.websocket import RecognizeCallback, AudioSource
import threading
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
import pandas as pd
authenticator = IAMAuthenticator('xxyyzz')
service = SpeechToTextV1(authenticator=authenticator)
service.set_service_url('https://api.us-east.speech-to-text.watson.cloud.ibm.com')
models = service.list_models().get_result()
#print(json.dumps(models, indent=2))
model = service.get_model('en-US_BroadbandModel').get_result()
#print(json.dumps(model, indent=2))
# This is the name of the file u need to change below
with open(join(dirname('__file__'), 'Call 8.wav'),
'rb') as audio_file:
# print(json.dumps(
output = service.recognize(
audio=audio_file,
speaker_labels=True,
content_type='audio/wav',
#timestamps=True,
#word_confidence=True,
inactivity_timeout = -1,
model='en-US_NarrowbandModel',
continuous=True).get_result(),
indent=2
############END################################
# get data to a csv
########################RUN THIS PART SECOND#####################################
df0 = pd.DataFrame([i for elts in output for alts in elts['results'] for i in alts['alternatives']])
df1 = pd.DataFrame([i for elts in output for i in elts['speaker_labels']])
list(df0.columns)
list(df1.columns)
df0 = df0.drop(["timestamps"], axis=1)
df1 = df1.drop(["final"], axis=1)
df1 = df1.drop(['confidence'],axis=1)
test3 = pd.concat([df0, df1], axis=1)
#sentiment
transcript = test3['transcript']
transcript = transcript.dropna()
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
text = transcript
scores = []
for txt in text:
vs = analyzer.polarity_scores(txt)
scores.append(vs)
data = pd.DataFrame(text, columns= ['Text'])
data2 = pd.DataFrame(scores)
final_dataset= pd.concat([data,data2], axis=1)
test4 = pd.concat([test3,final_dataset], axis=1)
test4 = test4.drop(['Text'],axis=1)
test4.rename(columns={'neg':'Negative'},
inplace=True)
test4.rename(columns={'pos':'Positive'},
inplace=True)
test4.rename(columns={'neu':'Neutral'},
inplace=True)
# This is the name of the output csv file
test4.to_csv("Call 8.csv")
How can i do this to transcribe multiple files in a folder instead of one file at a time?I can run this script multiple times but i want to automate it such that it picks up wav files from a folder and runs it. lets say I have 15 audio wav files in my folder C:\Python. I want to make it an automated process where it will run the script and get 15 csvs. 1 for each with their resp. outputs. right now this script works but have to manually run it for each wav file to get each wavs output csv.
Also,as a second question(sorry!), is there a way to speed up the transcription? breakup the wav files into smaller segments and send to watson but it didnt work. My reference was - (https://github.com/freelanceastro/interview-transcriber)