NEO4J Cypher Query: Multiple Aggregates

Question

Sample Data

Sample Query: At the end of this post

Objective: Searches similar to 'who knows exactly 2 Cust and works at 1 company'

Step 1: I just did a print of the number of connected Cust and Comp and its all good till now

MATCH (from:Cust), (a:Cust), (b:Comp),
p1=((from)-[r1]-(a)), p2=((from)-[r2]-(b))
WITH from, count(DISTINCT a) as knows, count(DISTINCT b) as works 
RETURN from.title, knows, works

Step 2: I went and added WHERE clause to filter the count, so far so good

MATCH (from:Cust), (a:Cust), (b:Comp),
p1=((from)-[r1]-(a)), p2=((from)-[r2]-(b))
WITH from, count(DISTINCT a) as knows, count(DISTINCT b) as works 
WHERE knows=2 and works=1
RETURN from.title, knows, works

Result > Alpha | 2 | 1

Step 3: Now I also want some filters on Cust and Comp add a and b to my WITH clause

MATCH (from:Cust), (a:Cust), (b:Comp),
p1=((from)-[r1]-(a)), p2=((from)-[r2]-(b))
WITH from, count(DISTINCT a) as knows, count(DISTINCT b) as works, a, b
RETURN from.title, knows, works

And BOOM! everything is 1, 1

It looks aggregate is getting confused with multiple variables in the WITH clause so and start to add multiple WITH clause and UNWIND, but I was unable to get the query for it.

Sample Data Creation

CREATE (a:Cust {title: "Alpha"})
CREATE (b:Cust {title: "Bravo"})
CREATE (c:Cust {title: "Charlie"})
CREATE (d:Cust {title: "Delta"})

create (g:Comp {title: "Google"})
create (f:Comp {title: "Facebook"})
create (s:Comp {ttile: "Stackoverflow"})

MATCH (a:Cust {title: "Alpha"}), (b:Cust {title: "Bravo"})
CREATE (a)-[:KNOWS]->(b)

MATCH (a:Cust {title: "Alpha"}), (c:Cust {title: "Charlie"})
CREATE (a)-[:KNOWS]->(c)

MATCH (d:Cust {title: "Delta"}), (c:Cust {title: "Charlie"})
CREATE (d)-[:KNOWS]->(c)

MATCH (c:Cust {title: "Charlie"}), (b:Cust {title: "Bravo"})
CREATE (c)-[:KNOWS]->(b)


MATCH (g:Comp {title: "Google"}), (s:Comp {ttile: "Stackoverflow"})
CREATE (g)-[:USES]->(s)

MATCH (f:Comp {title: "Facebook"}), (s:Comp {ttile: "Stackoverflow"})
CREATE (f)-[:USES]->(s)

MATCH (d:Cust {title: "Delta"}), (s:Comp {ttile: "Stackoverflow"})
CREATE (d)-[:WORKS_AT]->(s)

MATCH (d:Cust {title: "Delta"}), (g:Comp {title: "Google"})
CREATE (d)-[:WORKS_AT]->(g)

MATCH (a:Cust {title: "Alpha"}), (f:Comp {title: "Facebook"})
CREATE (a)-[:WORKS_AT]->(f)

MATCH (c:Cust {title: "Charlie"}), (s:Comp {ttile: "Stackoverflow"})
CREATE (c)-[:WORKS_AT]->(s)

It's unclear what result you want to get with the third query. The query itself works correctly: when you simultaneously use a variable and aggregate on it, you will logically receive one as a result. — stdob--
A query that does aggregation BUT also returns all the variables. I want the path variables for my result AND also want to apply filter like atleast 2 connections etc. — Srinath Ganesh

InverseFalcon InverseFalcon · Accepted Answer · 2018-04-15T13:08:50

As stdob-- says, the aggregation is working correctly. Aggregation in Cypher uses all the non-aggregation variables as the grouping key, which provides context for the aggregation.

In your step 2 query, you have:

WITH from, count(DISTINCT a) as knows, count(DISTINCT b) as works

knows and works are both aggregating using count(), so from is the grouping key...the counts are with respect to each from node.

In your step 3 query, you have

WITH from, count(DISTINCT a) as knows, count(DISTINCT b) as works, a, b

So from, a, and b are the grouping key... so for each row of from, a single a, and a single b node (note that this is a cross product with every a and b node connected to from), you're getting the count of distinct a nodes and b nodes...which is always going to be 1.

A better approach to get the answer you want (which I can't say with certainty, as you didn't specify what you really wanted to do with a and b in your query, you didn't use them in your return) is to get the degrees of relationship types from your from node, then collect the connected nodes (or use pattern comprehension to get them in a collection for you).

For example:

MATCH (from:Cust)
WITH from, size((from)-[:KNOWS]-()) as knowsDeg, 
     size((from)-[:WORKS_AT]-()) as worksAtDeg, 
     [(from)-[:KNOWS]-(a:Cust) | a] as known, 
     [(from)-[:WORKS_AT]-(b:Comp) | b] as worksAt
RETURN from.title, knowsDeg, worksAtDeg, known, worksAt

Also, I should point out that you have cross products in your original query, which is why you needed to use DISTINCT in your count:

MATCH (from:Cust), (a:Cust), (b:Comp),
p1=((from)-[r1]-(a)), p2=((from)-[r2]-(b))
WITH from, count(DISTINCT a) as knows, count(DISTINCT b) as works 
RETURN from.title, knows, works

If you had left off DISTINCT, the counts for each would have been the same, the product of the number of connected :Cust nodes * the number of connected :Comp nodes. If you only wanted counts (and wanted to get them through expansion rather than by degree as in my answer), you could get them without forming a cross product by collecting them after you match to each like so:

MATCH (from:Cust)--(a:Cust)
WITH from, count(a) as knows
MATCH (from)--(b:Comp)
WITH from, knows, count(b) as works
RETURN from.title as title, knows, works

NEO4J Cypher Query: Multiple Aggregates

1 Answers