Currently I have filtered all the noise out of a subset from the last-fm dataset and I am trying to build a recommendation system on top of this subset.
Basicly my dataset contains the following in a database:
userid | artistid | artist | totalplays
Example:
00000c289a1829a808ac09c00daf10bc3c4e223b 8bfac288-ccc5-448d-9573-c33ea2aa5c30 red hot chili peppers 691
What I want to achieve is a list of all users and the totalplays they had listened to a specific artist. For example a list containing integers (totalplays) for the red hot chili peppers. The problem with this is that when I query the following:
SELECT userid, totalplays FROM dataset WHERE artist = 'red hot chili peppers' I will only fetch the users that listened at least 1 time to the red hot chili peppers (the dataset does not contain entries of users that never listened to an artist).
I would like it to contain these integers aswell, in order to match two of these result sets with a Pearson's Correlation, so that I can build a recommendation system. (I have already implemented this).
I have tried building a temp table, which contains all distinct userid's and then update the amount of plays with an inner join on the dataset; but unfortunatelly that takes too long.
I have heard of the IF EXISTS possibilities in SQL, but I am not familiar with them. Can anyone please point me in the right direction?
Again: I'd like to get the totalplays of a user (no need to calculate the sum, as this i already done!) for the artist called 'red hot chili peppers' for example. Even when the user has no records in the dataset of listening to them (thus returning 0 for this user).
Thanks in advance!
artistidshould be enough - Sir Rufo