1
votes

I have a Cypher 1.9.5 query in Neo4j which when performed with three indexes just hangs. If I alter the query to use two indexes and a where clause then it works (still slow!)

In this altered sample I am looking for toys whose names start with 'tc_', in smallboxes whose names start with '2', which in turn are in bigboxes whose names start with 'p'

With 3 indexes this hangs

START b=node:BigBox('name:p*'), s=node:SmallBox('name:2*'), ts=node:Toys('name:tc_*')
MATCH b-[:SMALLBOX]->s, s-[:TOYS]->ts
RETURN count(ts)

But these work

START s=node:SmallBox('name:2*'), ts=node:Toys('name:tc_*')
MATCH b-[:SMALLBOX]->s, s-[:TOYS]->ts
WHERE b.name =~ '(?i)p.*'
RETURN count(ts)

START b=node:BigBox('name:p*'), ts=node:Toys('name:tc_*')
MATCH b-[:SMALLBOX]->s, s-[:TOYS]->ts
WHERE s.name =~ '(?i)2.*'
RETURN count(ts)

The last two give the answer that the first would have.

What do I need to do to allow more than two indexes in the START clause? Note that I more than 200,000 toys in 90-100 small boxes, which in turn are in 5 big boxes.

1

1 Answers

3
votes

You are committing a cardinal sin called cartesian product.

What's happening is you're taking all three sets of results of your index lookups: b, s, and ts, and for each b, it's finding all of the s's, and for each set of b+s, it's finding all of the b+s+ts. Then, for each of the combinations, it's finding a match.

The better way to solve this problem is to pick the smallest set in your start clause--b, then use traversals to find the potential ss and tss that match. So:

START b=node:BigBox('name:p*')
MATCH b-[:SMALLBOX]->s
WHERE s.name =~ "2.*" 
WITH b, s
MATCH s-[:TOYS]->ts
WHERE ts.name =~ "tc_.*"
RETURN count(ts)