0
votes

I have two kinds of nodes in my database: 1) User 2) Media 3) Tag

I also have a relationship with all Media nodes like such:

(:Media)-[:IS_SIMILAR]-(:Media)

And another relationship (:Media)-[:HAS_TAG]-(:Tag)

And another relationship (:User)-[:LIKES]-(:Media)

Here's a visualization:

The green nodes are media and blue is a user The green nodes are media and blue is a user (i excluded the tag nodes)

This IS_SIMILAR relationship has an attribute similarity. This attribute similarity is computed by calculating the number of tags each node pair has in common.

I am trying to perform content-based filtering by finding the media a user likes and getting top 10 media based on the similarity attribute.

I construct the following query:

Match(u:User{id:"Dorian"})-[:LIKES]-(m:Media)
WITH collect(m) as mu
UNWIND mu as m
Match(m)-[s:ISSIMILAR]-(o:Media)
WHERE NOT o in mu
RETURN DISTINCT o,s ORDER BY  s.similarity DESC

With the following results:

enter image description here Unfortunately, there are repeated Media nodes because each Media node that is liked by a user also has an IS_SIMILAR relationship with other media nodes.

Can you suggest:

1) how I can avoid this problem

2) another method to perform content-based recommendation with my schema?

1

1 Answers

1
votes

You were almost there. This should work:

MATCH (u:User{id:"Dorian"})-[:LIKES]-(m:Media)
WITH collect(m) as mu
UNWIND mu as m
MATCH (m)-[s:ISSIMILAR]-(o:Media)
WHERE NOT o IN mu
WITH o ORDER BY s.similarity DESC
RETURN DISTINCT o;

Unfortunately, Cypher does not like RETURN DISTINCT o ORDER BY s.similarity DESC, but accepts the logically equivalent WITH o ORDER BY s.similarity DESC RETURN DISTINCT o.