Executing this function row by row using a loop works. Executing the same function using pandas.DataFrame.apply returns ValueError: operands could not be broadcast together with shapes. Should the pandas.DataFrame.apply work? If it is one of those things that is not easily explainable, any ideas on how to speed up processing (other than multiprocessing)?
#python 3.6
import pandas as pd # version 0.19.2
import numpy as np #
#gensim version 1.0.1
from gensim import models #https://radimrehurek.com/gensim/models/word2vec.html
df=pd.DataFrame({"q1":[['how', 'I', 'from', 'iPhone', 'keep', 'them', 'my', 'but', 'delete', 'iCloud', 'photos', 'in', 'can'],
['use', 'are', 'radio', 'What', 'commercial', 'cognitive', 'technology', 'in'],
['how', 'I', 'razor', 'prevent', 'burns', 'the', 'stomach', 'on', 'can']],
"q2":[['Can', 'remove', 'from', 'I', 'iPhone', 'removing', 'them', 'my', 'storage', 'photos', 'iCloud', 'without'],
['radio', 'from', 'Where', 'do', 'come', 'cognitive', 'distinction'],
['how', 'I', 'razor', 'prevent', 'can', 'burn']]})
#using pretrained model https://code.google.com/archive/p/word2vec/
w2v = models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
#This works
df['w2v_sim']=np.nan
for i in range(len(df)):
df['w2v_sim'].ix[i]=w2v.n_similarity(df['q1'].ix[i],df['q2'].ix[i])
print(str(df['w2v_sim'].ix[i]))
#this doesn't work
df['w2v_sim']=np.nan
df['w2v_sim']=df.apply(w2v.n_similarity(df['q1'],df['q2']),axis=1)
ValueError: operands could not be broadcast together with shapes (13,300) (8,300)
Thank you