0
votes

How can I optimize this cypher query? It's 3-4 times slower than a similar query using Gremlin.

START movie=node:vertices(movieId="100") 
MATCH genera1<--movie<--()-[ratedRel:rated]->anotherMovie-->genera1 
WHERE ratedRel.stars > 3 
RETURN anotherMovie.title as title, anotherMovie.movieId as id, 
genera1.genera as genera, 
COUNT(anotherMovie) as count ORDER BY count(anotherMovie) DESC LIMIT 20;

I'm just trying to retrieve movies that have been rated with more than 3 stars and that have the same genera as the START node: http://markorodriguez.files.wordpress.com/2011/09/movielens-schema.png?w=350

I'm running the query in the console and I'm using Neo4j 1.9

The Gremlin query:

m = [:];
x = [] as Set;
v = g.v(node_id);

v.out('hasGenera').aggregate(x).back(2).inE('rated').
filter{it.getProperty('stars') > 3}.outV.outE('rated').
filter{it.getProperty('stars') > 3}.
inV.filter{it != v}.
filter{it.out('hasGenera').toSet().equals(x)}.
groupCount(m){\"${it.id}:${it.title.replaceAll(',',' ')}\"}.iterate();

m.sort{a,b -> b.value <=> a.value}[0..24];
1
pleae paste the gremlin query, tooulkas
can you try the query with Neo4j 1.9.M02 and see if it is still slower? Is the dataset the same that Marko used in his example?Michael Hunger
Yes, it's the same dataset and I tried with M02. It was still slower.Manuel Palacio
can you remove the distinct, you don't need it anyway.Michael Hunger

1 Answers

0
votes
START movie=node:vertices(movieId="100") 
MATCH movie-->genera1<-anotherMovie<-[ratedRel:rated]-user
WHERE ratedRel.stars > 3 
RETURN anotherMovie.title as title, anotherMovie.movieId as id, genera1.genera as genera, 
COUNT(ratedRel) as cnt ORDER BY cnt DESC LIMIT 20;

update: based on your gremlin query, could you pls try this one?

START movie=node:vertices(movieId="100") 
MATCH movie-[:hasGenera]->genera1<-[:hasGenera]-anotherMovie<-[ratedRel:rated]-user
WITH anotherMovie,count(ratedRel) as allVotes, sum(ratedRel) as allStars,genera1
WHERE allStars/allVotes>3 
RETURN anotherMovie.title as title, anotherMovie.movieId as id, genera1.genera as genera, 
allStars ORDER BY allStars DESC LIMIT 20;

the point is to exactly define as much elements in path as possible (in this case we missed the rels names) and to omit the user node which i don't know how to do that in cypher but it obviously isn't listed in the gremlin code.