pandas.DataFrame.apply ValueError: operands could not be broadcast together with shapes

Question

Executing this function row by row using a loop works. Executing the same function using pandas.DataFrame.apply returns ValueError: operands could not be broadcast together with shapes. Should the pandas.DataFrame.apply work? If it is one of those things that is not easily explainable, any ideas on how to speed up processing (other than multiprocessing)?

#python 3.6
import pandas as pd # version 0.19.2  
import numpy as np  # 
#gensim version 1.0.1
from gensim import models #https://radimrehurek.com/gensim/models/word2vec.html

df=pd.DataFrame({"q1":[['how', 'I', 'from', 'iPhone', 'keep', 'them', 'my', 'but', 'delete', 'iCloud', 'photos', 'in', 'can'],
                   ['use', 'are', 'radio', 'What', 'commercial', 'cognitive', 'technology', 'in'],
                   ['how', 'I', 'razor', 'prevent', 'burns', 'the', 'stomach', 'on', 'can']],
             "q2":[['Can', 'remove', 'from', 'I', 'iPhone', 'removing', 'them', 'my', 'storage', 'photos', 'iCloud', 'without'],
                  ['radio', 'from', 'Where', 'do', 'come', 'cognitive', 'distinction'],
                   ['how', 'I', 'razor', 'prevent', 'can', 'burn']]})

#using pretrained model https://code.google.com/archive/p/word2vec/
w2v = models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) 

#This works
df['w2v_sim']=np.nan
for i in range(len(df)):       
df['w2v_sim'].ix[i]=w2v.n_similarity(df['q1'].ix[i],df['q2'].ix[i])
print(str(df['w2v_sim'].ix[i]))

#this doesn't work
df['w2v_sim']=np.nan
df['w2v_sim']=df.apply(w2v.n_similarity(df['q1'],df['q2']),axis=1)

ValueError: operands could not be broadcast together with shapes (13,300) (8,300)

Thank you

nbraun nbraun · Accepted Answer · 2017-04-13T18:40:18

This is hard to reproduce considering the pretrained model is 1.5 GB, but I think it's because of your apply, which -- when called with axis=1 -- works by applying the function row by row to the dataframe. So it should only take one argument (the row, which is a Series). Try this:

df['w2v_sim']=df.apply(lambda x: w2v.n_similarity(x.q1, x.q2), axis=1)

pandas.DataFrame.apply ValueError: operands could not be broadcast together with shapes

1 Answers