0
votes

I have a dataframe of writers like this:

writer_id titles
n1 t1,t2
n2 t3,t4

And I have another dataframe for the titles with genre, ratings and votes like this:

title_id genres votes rating
t1 Drama,Action 100 7.0
t2 Action,Thriller 1000 8.0
t3 Crime ,Romance 200 6.0
t4 Drama,Romance 300 5.0

Now in the new data frame, I want to have a row for each writer with columns for each genre with the count value and another column (let's call it popularity) that will apply a formula using votes and rating. So it would look like this:

writer_id drama action thriller romance crime popularity
n1 1 2 1 0 0 2.2
n2 0 1 0 2 1 4.2

How should I go about doing this? I have the columns with genres already created.

1
What is formula?jezrael
for sake of simplicity , let's say it is : (w * (rating)) + (6.0 * (1-w)) where w = (votes / max(votes)). Hope it is clear. Thanks!sabman
hmmm, how is count 2.2 ? what is w for writer_id=n1?jezrael
I am using dummy values for the popularity. It is not the actual value. But assume 0.8 for w.sabman
Ah sorry about that. Let me use a better example formula : (votes1*rating1 + ..+ votesN*ratingsN) / (votes1+..+votesN). So for n1, ( (100*7.0) + (1000 * 8.0) ) / (1000 + 100). Hope that helps.sabman

1 Answers

1
votes

First part is DataFrame.explode by comma separated values in both DataFrames and merge, for counts use crosstab:

df11 = (df1.assign(title_id= df1['titles'].str.split(','))
           .explode('title_id')
           .drop('titles', 1))
df22 = (df2.assign(genres= df2['genres'].str.split(','))
           .explode('genres'))


df = df11.merge(df22, on='title_id') 

df4 = (df.assign(w = lambda x: x['votes'].mul(x['rating']))
         .groupby('writer_id')
         .sum()
         .assign(w = lambda x: x['w'].div(x['votes'])))

df3 = pd.crosstab(df['writer_id'], df['genres']).assign(popularity = df4['w'])
print (df3)
genres     Action  Crime  Drama  Romance  Thriller  popularity
writer_id                                                     
n1              2      0      1        0         1    7.909091
n2              0      1      1        2         0    5.400000