0
votes

We were trying to build an online recommender (collaborative filtering user-user) using cosine similarity with data in Neo4j.

**A difference was the input data set is a boolean preference (as opposed to a rating) ** for 1 mil users X ~700 products. eg. User_ID, Product_ID, Preference 11,48989399,1

Created nodes for users and products with index on id (user_id, product_id)

I tried writing a cypher query to get the top 20 closest neighbours based on the formula

Similarity = (Products liked by both users) / sqrt(# of products liked by user1) * sqrt(# of products liked by user2)

Below is the query:

MATCH (a:Users)-[d]->() using index a:Users(id) where a.id =1 
WITH a.id as user1, count(d) as user1_prod  
MATCH (a:Users)-[]->()<-[dd]-others using index a:Users(id) where a.id =1 
WITH user1, user1_prod, others, count(dd) as intersect 
MATCH others-[b1]->() with user1, others.id as user2, intersect, user1_prod, count(b1) as user2_prod 
WITH user1, user2, intersect/(sqrt(user1_prod) * sqrt(user2_prod)) as similarity 
RETURN user2, similarity order by similarity desc limit 20;    

The query returns results in close to 22 seconds post which the recommendation of products is scalable and fast.

Is there a better way to write the cypher for similarity since the graph might be more dense in further scenarios.

Details: Kernel version Neo4j - Graph Database Kernel (neo4j-kernel), version: 2.1.6

772 772 nodes

neostore.relationshipstore.db.mapped_memory 3078M

CentOS release 6.6 (Final)

1

1 Answers

0
votes

It will be much faster if you rewrite it as an Neo4j server extension, then you can utilize node.getDegree() which is constant time retrieval of a node's degree.

The core code would look like this, you can simplify it by extracting a function for getting the products per user.

Node user1 = db.findByLabelAndProperty(User,"id",1);
long likes1 = user1.getDegree(LIKES,OUTGOING);
Set<Node> products1 = new HashSet<>(likes1);
for (Relationship rel = user1.getRelationships(LIKES,OUTGOING)) {
   products1.add(rel.getEndNode());
}

Node user2 = db.findByLabelAndProperty(User,"id",2);
long likes2 = user2.getDegree(LIKES,OUTGOING);
Set<Node> products2 = new HashSet<>(likes2);
for (Relationship rel = user2.getRelationships(LIKES,OUTGOING)) {
   products2.add(rel.getEndNode());
}

products1.retainAll(products2);

return products1.size() / (sqrt(likes1) * sqrt(likes2));