neo4j noob here, on Neo4j 2.0.0 Community
I've got a graph database of 24,000 movies and 2700 users, and somewhere around 60,000 LIKE relationships between a user and a movie.
Let's say that I've got a specific movie (movie1) in mind.
START movie1=node:Movie("MovieId:88cacfca-3def-4b2c-acb2-8e7f4f28be04")
MATCH (movie1)<-[:LIKES]-(usersLikingMovie1)
RETURN usersLikingMovie1;
I can quickly and easily find the users who liked the movie with the above query. I can follow this path further to get the users who liked the same movies that as the people who liked movie1. I call these generation 2 users
START movie1=node:Movie("MovieId:88cacfca-3def-4b2c-acb2-8e7f4f28be04")
MATCH (movie1)<-[:LIKES]-(usersLikingMovie1)-[:LIKES]->(moviesGen1)<-[:LIKES]-(usersGen2)
RETURN usersGen2;
This query takes about 3 seconds and returns 1896 users.
Now I take this query one step further to get the movies liked by the users above (generation 2 movies)
START movie1=node:Movie("MovieId:88cacfca-3def-4b2c-acb2-8e7f4f28be04")
MATCH (movie1)<-[:LIKES]-(usersLikingMovie1)-[:LIKES]->(moviesGen1)<-[:LIKES]-(usersGen2)-[:LIKES]->(moviesGen2)
RETURN moviesGen2;
This query causes neo4j to spin for several minutes at 100% cpu utilization and using 4GB of RAM. Then it sends back an exception "OutOfMemoryError: GC overhead limit exceeded".
I was hoping someone could help me out and explain to me the issue. Is Neo4j not meant to handle a query of this depth in a performant manner? Is there something wrong with my Cypher query?
Thanks for taking the time to read.