Neo4j Legacy relationship auto-index Slow in cypher query

Question

NODES

1000000 x ({prop:'a'})
1000000 x ({prop:'b'})
1000000 x ({prop:'c'})

NODE SET = ~3MegaNodes

Obs.: prop is not an exclusive atribute.

RELATIONSHIPS

1000 x [:TYPEA {date:20150301} ]
1000 x [:TYPEA {date:20150228} ]
1000 x [:TYPEA {date:20150227} ]
1000 x [:TYPEA {date:........} ]
1000 x [:TYPEA {date:19000101} ]

1000 x [:TYPEB {date:20150301} ]
1000 x [:TYPEB {date:20150228} ]
1000 x [:TYPEB {date:20150227} ]
1000 x [:TYPEB {date:........} ]
1000 x [:TYPEB {date:19000101} ]

TYPEA = 42062 days x 1 000 rels

TYPEA = ~42 000 000

TYPEB = ~42 000 000

RELATIONSHIP SET = ~84 MegaRels

I wanna match the pattern:

MATCH (n1 {prop:'a'}) -[ r1:TYPEA {date:20001231} ]-> (n2 {prop:'b'})
RETURN n2;

Improve by indexing

My neo4j.properties:

relationship_auto_indexing=true
relationship_keys_indexable=date

The cypher query:

START 
  r1 = relationship:relationship_auto_index('date:20001231')
MATCH (n1 {prop:'a'}) -[r1:TYPEA]-> (n2 {prop:'b'})
RETURN n2;

:) Work fine!

Now, I wanna match the pattern:

MATCH
  (n1 {prop:'a'})
  -[ r1:TYPEA {date:20001231} ]->
  (n2 {prop:'b'})
  -[ r2:TYPEA {date:20001231} ]->
  (n3  {prop:'c'})
RETURN n2, n3;

Then I try:

START 
  r1 = relationship:relationship_auto_index('date:20001231'),
  r2 = relationship:relationship_auto_index('date:20001231')
MATCH (n1 {prop:'a'}) -[r1:TYPEA]-> (n2 {prop:'b'}) -[r2:TYPEA]-> (n3 {prop:'c'})
RETURN DISTINCT n2,  n3;

:( Run Slow.

Because Cartesian product occurs producing many intermediate results. 1000 ^ 2.

On the one hand, is not possible use the same identifier more than once in the query.

On the other hand, Labels index (Schema) do not apply to relationships.

There are hope? (Release: Neo4j-community-2.2.0)

There are any benefit in relationship legacy indexing when not using the clause start in the query cypher?

Thanx

Fernando Santos Fernando Santos · Accepted Answer · 2015-04-04T22:33:42

This modify the conceptual query, but worked fine:

START 
  r = relationship:relationship_auto_index('date:20001231')
WITH [x IN COLLECT(r) WHERE TYPE(x)='TYPEA'] AS cr
UNWIND cr AS r1
  MATCH (n1 {prop:'a'}) -[r1]-> (n2 {prop:'b'})
WITH DISTINCT n2, cr
UNWIND cr AS r2
  MATCH (n2) -[r2]-> (n3 {prop:'c'})  
RETURN DISTINCT n2,  n3;

Thx

Neo4j Legacy relationship auto-index Slow in cypher query

1 Answers