2
votes

I am trying to build a graph of different entities liked by people on Facebook to create a basic cross domain recommendation engine.

I have got data for different entities (movies, books, music, etc). Nodes are created for each item with properties as name of the item (name of the movie, book, etc) and entity type of the item (movie, book, etc). Any two nodes have relationships between them called "affinity". This relationship also has a "strength" property, which is equal to the no. of people who have liked these two items.

I use FB users to connect these nodes. FB users also are nodes in the graph with properties as name of the person and type as person. The relationship between these nodes and item nodes is called 'likes'. Now if a person has liked a movie, I would like to recommend him books or music by traversing the graph. This is the cypher query I am trying to traverse the graph:

START root = node(<LIKED_MOVIE_NODE_ID>)
MATCH p = root-[rel1:affinity*..3]-(movies)<-[rel2:likes]-(persons)-[rel3:likes]->(books)
WHERE HAS(movies.type) and movies.type = "movies" and HAS(persons.type) and persons.type = "person" and HAS(books.type) and books.type = "books"
RETURN books

This runs very slow, sometimes taking upto 500 secs. I have got some 13000 movies, 2000 books and 3000 music nodes. Connecting them are 16000 people. All together there are some 300,000 relationships.

My questions are :

  1. Am I doing something wrong? Is there a better way to do this? I am new to neo4j. I have tried some of the techniques for tuning the neo4j graphDB. I have increased the min heap size to 4 GB and am running it on a 8 core machine with 32 GB RAM.

  2. I want to know the strength of the relationships rel1 and number of rel2 and rel3. Rel1 has got a property strength. I am not able to find it out,

Please advise as I am on the verge of giving up neo4j and going back to SQL. Atleast it works. :(

Regds, Paritosh

1

1 Answers

1
votes

Cypher is slow. Actually very slow when compare to the traversal and core API (http://java.dzone.com/articles/get-full-neo4j-power-using)

That said, you could try to limit the amount of nodes neo4j processes, by splitting up your Match into different WITH clauses. Depending on your usecase you could for example put the root-[rel1:affinity*..3]-(movies) in a seperate clause, and filter out the distinct movies. Else neo4j will process all combinations of paths which lead to a movie.

PS:

WHERE HAS(movies.type) and movies.type = "movies" and HAS(persons.type) and persons.type = "person" and HAS(books.type) and books.type = "books"

can be rewritten as

WHERE movies.type! = "movies" and persons.type! = "person" and books.type! = "books"

Or if you are using neo4j 2.0.0M4 you can just skip the HAS()