0
votes

I have an Item node, which is POSTED_BY a User node. Additionally, the Item may be LIKED by zero or more User nodes, and zero or more Comments may be POSTED_IN the Item node.

What I need to do is to query for a specific Item node (given the id), and retrieve:

  1. The Item node itself
  2. The User node that the Item was POSTED_BY
  3. The count of the number of Users who LIKE the Item node
  4. The count of the number of Comments POSTED_IN the Item node

If I try to query just 1-3, and exclude 4 for now, my query looks something like this:

MATCH (i:Item {id: 'ByvIzUdbZ'})-[:POSTED_BY]->(u:User) 
OPTIONAL MATCH (u2:User)-[:LIKES]->(i)  
RETURN i, u, COUNT(u2);

This returns the Item node, the User and the LIKES count correctly.

Now, to accommodate the 4th point, I modified the query to add another OPTIONAL MATCH, as below:

MATCH (i:Item {id: 'ByvIzUdbZ'})-[:POSTED_BY]->(u:User) 
OPTIONAL MATCH (u2:User)-[:LIKES]->(i) 
OPTIONAL MATCH (comment:Comment)-[:POSTED_IN]->(i) 
RETURN i, u, COUNT(u2), COUNT(comment);

With this, however, the counts for both User (u2) and Comment (comment) are incorrect. They seem to be getting "added up". So, if the Likes was actually 3, the likes and comment count are both returned as 6. Any tips on what I'm doing wrong here? Thanks.

2

2 Answers

2
votes

You are seeing the effect of cartesian products.

If the number of rows that would be generated by a MATCH clause on its own would be N, but the number of rows that already exist (say, from previous MATCH clauses) is M, then you end up with M*N rows.

To avoid this multiplicative effect (which also adversely hits performance and memory), you can perform aggregation immediately after the appropriate clauses. For example, in your case:

MATCH (i:Item {id: 'ByvIzUdbZ'})-[:POSTED_BY]->(u:User) 
OPTIONAL MATCH (u2:User)-[:LIKES]->(i)
WITH i, u, COUNT(u2) AS u2_count
OPTIONAL MATCH (comment:Comment)-[:POSTED_IN]->(i) 
RETURN i, u, u2_count, COUNT(comment) AS comment_count;
0
votes

Cybersam has a good explanation of why you're seeing what you're seeing. Keep in mind that the cardinality (number of rows) also affects how many times the operations in your query (such as MATCH and OPTIONAL MATCH) are run.

For example, in your first MATCH, if 10 users posted the same item, then you will have 10 rows of that one item in combination with each of the 10 users.

Then the OPTIONAL MATCH executes, and though you mean it to only execute once per item (you're trying to get the like count per item), it will execute 10 times on the same item, since there are 10 rows with that item. That's extra work that you can avoid by attempting to get the item count to one row before you execute that match, either through aggregation of users, or getting the data you need for each item before you match to users.

Here's an example query that might work a bit better, saving the matching to users until the end, keeping cardinality to 1 for each of the OPTIONAL MATCHes.

MATCH (i:Item {id: 'ByvIzUdbZ'})       // one row
OPTIONAL MATCH (u2:User)-[:LIKES]->(i)
WITH i, COUNT(u2) AS u2_count          // back to 1 row
OPTIONAL MATCH (comment:Comment)-[:POSTED_IN]->(i) 
WITH i, u2_count, COUNT(comment) AS comment_count; // back to one row again
MATCH (i)-[:POSTED_BY]->(u:User) 
RETURN i, u, u2_count, comment_count