2
votes

I have a really long cypher query which uses union, however, there are some common statements (in bold) which are repeated in both queries. Is there a way I can factor out, or even store the result set of the common statements then have them branched and unioned at a later point? I have investigated using with, collect and optional match but to no avail.

MATCH (s:Subject), (p:Programme)
WHERE s.name in ['A', 'B', 'C']
WITH collect(s) as subs, p
WITH p, subs, SIZE(FILTER(c in subs WHERE c.level ="CSEC")) as csecs, SIZE(FILTER(c in subs WHERE c.level ="CAPE")) as capes
WHERE p.csec_passes <= csecs AND p.cape_passes <= capes
MATCH (p:Programme)-[:requires]->(s:Subject)

WITH p, subs, COLLECT(s) AS mandatories WHERE ALL(n IN mandatories WHERE n IN subs) AND NOT (p)-->(:Combo)
RETURN p

UNION

MATCH (s:Subject), (p:Programme)
WHERE s.name in ['A', 'B', 'C']
WITH collect(s) as subs, p
WITH p, subs, SIZE(FILTER(c in subs WHERE c.level ="CSEC")) as csecs, SIZE(FILTER(c in subs WHERE c.level ="CAPE")) as capes
WHERE p.csec_passes <= csecs AND p.cape_passes <= capes
MATCH (p:Programme)-[:requires]->(s:Subject)

WITH p, subs, COLLECT(s) AS mandatories WHERE ALL(n IN mandatories WHERE n IN subs)
MATCH (p)-[:requires]->(c:Combo)-[:contains]->(s:Subject)
WITH p, c, subs, collect(s) as list
WITH p, subs, collect({amt:c.amt, set:list}) as combos
WHERE ALL(combo in combos where combo.amt <= size(apoc.coll.intersection(subs, combo.set))) RETURN p

Some additional context; all programme nodes are connected to at least 1 subject node which is called mandatory. Additionally, some programme nodes are also connected to one or more combo nodes. In such instances, more checks are to be done on programmes, I'm unioning queries on both types, combo and non-combo.

1

1 Answers

4
votes

First, a couple of important notes.

  1. Cypher does not dictate how information is retrieved. This kind of optimization is something the Cypher planner should handle (It doesn't now, but may change in the future)

  2. Cypher runs UNION queries in parallel, which means that unless you are pushing your Neo4j server to its limits, the time of the query should be fairly indistinguishable from if you had only run the more expensive of the two queries. (Note that repeat runs may be faster due to in-memory-cache) So if time is your issue, it shouldn't be. If the DBHits are the issue than for now you shouldn't use UNION.


That said, I can combine these two queries by adding OPTIONAL and SIZE(combo.set)=0 OR. Comments added to explain logic

MATCH (s:Subject), (p:Programme) 
WHERE s.name in ['A', 'B', 'C'] 
WITH collect(s) as subs, p 
WITH p, subs, SIZE(FILTER(c in subs WHERE c.level ="CSEC")) as csecs, SIZE(FILTER(c in subs WHERE c.level ="CAPE")) as capes 
WHERE p.csec_passes <= csecs AND p.cape_passes <= capes 
MATCH (p:Programme)-[:requires]->(s:Subject)

WITH p, subs, COLLECT(s) AS mandatories WHERE ALL(n IN mandatories WHERE n IN subs)
OPTIONAL MATCH (p)-[:requires]->(c:Combo)-[:contains]->(s:Subject)
// Where c is null, list is empty
WITH p, c, subs, collect(s) as list
// If c is null, combos is a list of empty lists
WITH p, subs, collect({amt:c.amt, set:list}) as combos
// SIZE(combo.set)=0 is true if the list is null or an empty list
WHERE ALL(combo in combos where SIZE(combo.set)=0 OR combo.amt <= size(apoc.coll.intersection(subs, combo.set))) RETURN p