1
votes

I need to query multiple paths (to draw in the UI) from different start nodes in Neo4j (could be hundreds of paths), I'm using UNION to combine the queries but I suspect that it's not the most efficient way to retrieve the data (I'm also using indexes on the start nodes). When creating large queries, all kinds of nasty stuff happen.

The query looks something like this:

MATCH path = ((n:actor)-[*0..2]->(g)-[p:was_filmed_at]->(k:movie_set)-[l:located]->(i:state)) 
WHERE n.name=’john doe1' AND (g:acted_in OR g:produced) 
AND p.addDate >= '2017-12-10T00:00:00+00:00' 
AND p.addDate <= '2017-12-12T08:08:06+00:00' 
AND l.addDate >= '2017-12-10T00:00:00+00:00' 
AND l.addDate <= '2017-12-12T08:08:06+00:00' 
RETURN nodes(path), relationships(path) 
LIMIT 1000

UNION 

MATCH path = ((n:actor)-[*0..2]->(g)-[p:was_filmed_at]->(k:movie_set)-[l:located]->(i:state)) 
WHERE n.name=’john doe2' AND (g:acted_in OR g:produced) 
AND p.addDate >= '2017-12-10T00:00:00+00:00' 
AND p.addDate <= '2017-12-12T08:08:06+00:00' 
AND l.addDate >= '2017-12-10T00:00:00+00:00' 
AND l.addDate <= '2017-12-12T08:08:06+00:00' 
RETURN nodes(path), relationships(path) 
LIMIT 1000

UNION 

MATCH path = ((n:actor)-[*0..2]->(g)-[p:was_filmed_at]->(k:movie_set)-[l:located]->(i:state)) 
WHERE n.name=’john doe3' AND (g:acted_in OR g:produced) 
AND p.addDate >= '2017-12-10T00:00:00+00:00' 
AND p.addDate <= '2017-12-12T08:08:06+00:00' 
AND l.addDate >= '2017-12-10T00:00:00+00:00' 
AND l.addDate <= '2017-12-12T08:08:06+00:00' 
RETURN nodes(path), relationships(path) 
LIMIT 1000

This is a query with only 3 paths, and it does not scale well with hundreds. Is there a simple, more efficient way to do this?

1
Assuming you change only the name, why don't you OR it instead of unioning paths ? Also, have you used PROFILE on you query ? to find eventual bottleneck(s)Jerome_B
OR would still make a large query, I marked the solution by @Bruno Peres as it creates a shorter query. Thanks!Yuval

1 Answers

1
votes

Since you are changing only the n.name comparison, you can rewrite your query using IN operator and pass the list of names as an array. This way:

MATCH path = ((n:actor)-[*0..2]->(g)-[p:was_filmed_at]->(k:movie_set)-[l:located]->(i:state))
WHERE n.name IN ['john doe1', 'john doe2', 'john doe3']
AND (g:acted_in OR g:produced)
AND '2017-12-10T00:00:00+00:00' <= p.addDate <= '2017-12-12T08:08:06+00:00'
AND '2017-12-10T00:00:00+00:00' <= l.addDate <= '2017-12-12T08:08:06+00:00'
RETURN nodes(path), relationships(path)
LIMIT 1000

Also you can chain inequalities like: AND '2017-12-10T00:00:00+00:00' <= p.addDate <= '2017-12-12T08:08:06+00:00'. Make sure you have an index on :actor(name) for quick index lookup of your starting node. (Thanks to @InverseFalcon)