0
votes

I'm trying to perform a aggregation query on a variable length path where the nodes I want to aggregate on are not in the original path, but instead are related to them. For example my path looks like

MATCH p = (:Visit)-[:NEXT*]->(:Visit)
RETURN p

but each (:Visit) node is related to a (:Destination)

(:Visit)-[:LOCATION]->(:Destination)

The aggregation I want is to count the common paths based on the id property of the Destination nodes not the Visits. I figured out a way to use a Union to combine many fixed length paths results

MATCH (d1:Destination)--(v1:Visit), (d2:Destination)--(v2:Visit)
WHERE (v1:Visit)-[:NEXT]->(v2:Visit)
RETURN [d1.id,d2.id] AS Path, count(*) AS PathCount
UNION
MATCH (d1:Destination)--(v1:Visit), (d2:Destination)--(v2:Visit), (d3:Destination)--(v3:Visit)
WHERE (v1:Visit)-[:NEXT]->(v2:Visit)-[:NEXT]->(v3:Visit)
RETURN [d1.id,d2.id,d3.id] AS Path, count(*) AS PathCount
UNION ...

But this isn't a very good solution if the paths are say of length 200, and I'm worried about the performance of using many Unions.

I have created a Neo4j Gist here with the sample data: http://gist.neo4j.org/?a8ab894c5c9740a94747

Sample Data

CREATE
// Destinations.
(d1:Destination {id:'A'}),
(d2:Destination {id:'B'}),
(d3:Destination {id:'C'}),
(d4:Destination {id:'D'}),
(d5:Destination {id:'E'}),
(d6:Destination {id:'F'}),
// First Route
(v1:Visit {time:1}),
(v2:Visit {time:2}),
(v3:Visit {time:3}),
(v4:Visit {time:4}),
(v5:Visit {time:5}),
(v1)-[:LOCATION]->(d1),
(v2)-[:LOCATION]->(d2),
(v3)-[:LOCATION]->(d3),
(v4)-[:LOCATION]->(d4),
(v5)-[:LOCATION]->(d6),
(v1)-[:NEXT]->(v2)-[:NEXT]->(v3)-[:NEXT]->(v4)-[:NEXT]->(v5),
// Second Route
(v6:Visit {time:10}),
(v7:Visit {time:21}),
(v8:Visit {time:23}),
(v10:Visit {time:45}),
(v6)-[:LOCATION]->(d1),
(v7)-[:LOCATION]->(d2),
(v8)-[:LOCATION]->(d4),
(v9)-[:LOCATION]->(d6),
(v10)-[:LOCATION]->(d5),
(v11)-[:LOCATION]->(d3),
(v6)-[:NEXT]->(v7)-[:NEXT]->(v8)-[:NEXT]->(v9)-[:NEXT]->(v10)-[:NEXT]->(v11);

Expected Output

Path    PathCount
[A, B]  2
[D, F]  1
[B, D]  1
[B, C]  1
[C, D]  1
[B, C, D]   1
[C, D, F]   1
[A, B, C]   1
[A, B, D]   1
... many more
1

1 Answers

0
votes

Does the following work for you? It returns the head of each path as PathHead, an ordered collection of all the other nodes in the path as PathTail, and the number of steps in the path as PathCount.

MATCH (d1:Destination)<-[:LOCATION]-(v1:Visit)-[:NEXT*]->(:Visit)-[:LOCATION]->(d2:Destination)
RETURN d1.id as PathHead, COLLECT(d2.id) AS PathTail, COUNT(*) AS PathCount