1
votes

Currently I have filtered all the noise out of a subset from the last-fm dataset and I am trying to build a recommendation system on top of this subset.

Basicly my dataset contains the following in a database:

userid | artistid | artist | totalplays

Example:

00000c289a1829a808ac09c00daf10bc3c4e223b   8bfac288-ccc5-448d-9573-c33ea2aa5c30    red hot chili peppers   691

What I want to achieve is a list of all users and the totalplays they had listened to a specific artist. For example a list containing integers (totalplays) for the red hot chili peppers. The problem with this is that when I query the following:

SELECT userid, totalplays FROM dataset WHERE artist = 'red hot chili peppers' I will only fetch the users that listened at least 1 time to the red hot chili peppers (the dataset does not contain entries of users that never listened to an artist).

I would like it to contain these integers aswell, in order to match two of these result sets with a Pearson's Correlation, so that I can build a recommendation system. (I have already implemented this).

I have tried building a temp table, which contains all distinct userid's and then update the amount of plays with an inner join on the dataset; but unfortunatelly that takes too long.

I have heard of the IF EXISTS possibilities in SQL, but I am not familiar with them. Can anyone please point me in the right direction?

Again: I'd like to get the totalplays of a user (no need to calculate the sum, as this i already done!) for the artist called 'red hot chili peppers' for example. Even when the user has no records in the dataset of listening to them (thus returning 0 for this user).

Thanks in advance!

1
why do you store redundant artist information? artistid should be enough - Sir Rufo
sometimes artistid is null for some odd reason. it is however used for different purposes. all of those columns are needed for the site ;) - user1949616

1 Answers

2
votes

Try this

SELECT A.USERID, coalesce(X.totalplays,0) totalplays
FROM (SELECT DISTINCT USERID FROM dataset) A
LEFT OUTER JOIN  (
SELECT userid, totalplays FROM dataset 
WHERE artist = 'red hot chili peppers') X ON A.userid = X.userid

If you have separate user table then

SELECT A.USERID, coalesce(X.totalplays,0) totalplays
FROM UserTable A
LEFT OUTER JOIN  (
SELECT userid, totalplays FROM dataset 
WHERE artist = 'red hot chili peppers') X ON A.userid = X.userid