3
votes

I am new to Neo4J, Cypher and StackOverflow so let me know when I’m bucking protocol.

I have a test Neo4J database with a little over 7k nodes representing movies and a fair amount of relationships connected to each movie. I’ve noticed that when I run queries that are sorted by a tally, I get different rows back depending on what parameters of the node I return. One specific example is when I try to list the movies with the most relationships.

This CQL query

MATCH (m:Movie)-[r]-() 
RETURN m.title, count(r) AS rel_count ORDER BY rel_count DESC LIMIT 10

returns

m.title                 rel_count
The Three Musketeers    184
Scary Movie 3           128
Watchmen                114
Hero                    113
Horrible Bosses 2       113
The Longest Yard        112
Carrie                  111
The Butterfly Effect    111
Aliens                  111
Dick Tracy              110

while the CQL query

MATCH (m:Movie)-[r]-() 
RETURN m.title, m.tmsId, count(r) AS rel_count ORDER BY rel_count DESC LIMIT 10

returns

m.title                       m.tmsId           rel_count
Forrest Gump                  MV000398520000    106
The Karate Kid                MV000125280000    79
Evolution                     MV001067180000    76
Original Sin                  MV000790300000    74
Kill Bill: Vol. 2             MV001405480000    74
American Beauty               MV000773810000    74
O Brother, Where Art Thou?    MV000914040000    73
The Godfather                 MV000120060000    72
Sin City                      MV001558150000    71
The Karate Kid Part II        MV000205920000    71

My assumption is(was) that the MATCH statement collects the data for the nodes m and the relationships r and the RETURN statement just filters the output to specific parameters. Therefore, I'd expect the same list returned for each call. Anyone see something wrong with my Cypher or expectations?

Even though I have a specific problem, I would also like to know why the problem is happening for some insight into Cypher’s syntax and execution. I've added the PROFILE output if that is helpful.

Profile Chart 1

Profile Chart 2

Thanks in advanced, -johnt

2

2 Answers

2
votes

I think you're right. In Cypher the RETURN and WITH statements just work on the tabular data that they're given. If you specify an aggregate function like count/sum/etc... then it will automatically group based on the other, non-aggregate columns you've specified.

As a debugging step, what if you were to try this?

MATCH (m:Movie)-[r]-() 
WITH m.title AS title, m.tmsId AS tmsId, count(r) AS rel_count
RETURN title, tmsId, rel_count ORDER BY rel_count DESC LIMIT 10
1
votes

I think we found the answer. It seems that the COUNT(r) will count all the r's based on the other return values. Therefore when I ask for non-unique parameters I get the sum of the r's for all nodes with that parameter. In this specific case, There are 4 movies with the title "The Three Musketeers" and all 4 of their r's sum to 184. Adding the unique parameter tmsId returns the actual results for which I'm looking.

We found that we could make the non-unique parameters counted separately with:

MATCH (m:Movie)-[r]-()
WITH DISTINCT m, COUNT(r) AS rel_count
RETURN m.title, rel_count
ORDER BY rel_count DESC LIMIT 12