How to make this Neo4J Cypher query execute faster?

Question

I have the following Cypher query in Neo4J, which gets all the nodes in the graph and their connections for a JSON file, which is then used to display a graph using Sigma.Js library.

MATCH (c1:Concept), (c2:Concept), (ctx:Context), c1-[rel:TO]->c2 
WHERE (rel.user='9d6e7140-f3c3-11e3-927f-1f5ca4210ac7' 
AND ctx.uid = rel.context) 
WITH DISTINCT c1, c2 
MATCH (ctxname:Context), c1-[relall:TO]->c2 
WHERE (relall.user='9d6e7140-f3c3-11e3-927f-1f5ca4210ac7' 
AND ctxname.uid = relall.context) 
RETURN DISTINCT 
c1.uid AS source_id, 
c1.name AS source_name, 
c2.uid AS target_id, 
c2.name AS target_name, 
relall.uid AS edge_id, 
ctxname.name AS context_name, 
relall.statement AS statement_id, 
relall.weight AS weight;

This particular query returns 89 rows of data.

The strange thing is that it works relatively fast when the number of c1 and c2 nodes and rel relationships is small. However, as the number of those nodes and the relations between them increase the query gets super slow, probably because Neo4J has to reiterate through a lot of relationships.

Do you have any idea how I could make this query faster provided that I need it to return data in the same format and that it should be all made in one query?

Here's the profile info:

Distinct(_rows=89, _db_hits=0)
Extract(symKeys=["c1", "c2", "ctxname", "relall"], exprKeys=["source_name", 
"statement_id", "edge_id", "target_id", "source_id", "target_name", "context_name", 
"weight"], _rows=89, _db_hits=712)

Filter(pred="(Property(relall,user(8)) == Literal(9d6e7140-f3c3-11e3-927f-1f5ca4210ac7) 
AND Property(ctxname,uid(1)) == Property(relall,context(7)))", _rows=89, _db_hits=267)
SimplePatternMatcher(g="(c1)-['relall']-(c2)", _rows=89, _db_hits=2166150)
NodeByLabel(identifier="ctxname", _db_hits=0, _rows=44100, label="Context", 
identifiers=["ctxname"], producer="NodeByLabel")

Distinct(_rows=84, _db_hits=0)
Filter(pred="Property(ctx,uid(1)) == Property(rel,context(7))", _rows=89, _db_hits=93450)
        NodeByLabel(identifier="ctx", _db_hits=0, _rows=46725, label="Context",
 identifiers=["ctx"], producer="NodeByLabel")
          Filter(pred="hasLabel(c2:Concept(1))", _rows=89, _db_hits=0)
            TraversalMatcher(start={"label": "Concept", "producer": "NodeByLabel",      
"identifiers": ["c1"]}, trail="(c1)-[rel:TO WHERE hasLabel(NodeIdentifier():Concept(1)) 
AND Property(RelationshipIdentifier(),user(8)) == Literal(9d6e7140-f3c3-11e3-927f-
1f5ca4210ac7)]->(c2)", _rows=89, _db_hits=127572)

Thank you for any help you can provide or at least if you can tell me where the weak spot of this query is judging from the profile info above...

Michael Hunger Michael Hunger · Accepted Answer · 2014-06-14T16:31:06

Your relatioship is a "hyperedge" and should be a node, and you know this from past discussion :)

As you don't have an index lookup for the starting point this query has to scan the full graph.

Enable the relationship-auto-index for the field user and start this query with a relationship-lookup.

Also your Context is matched for every relationship it finds, not sure if you expect more than one context to match ??

Also make sure to have an index on :Context(uid)

START rel = relationship:relationship_auto_index(user='9d6e7140-f3c3-11e3-927f-1f5ca4210ac7')
WHERE type(rel) = "TO"
WITH rel, startNode(rel) as c1, endNode(rel) as c2
WHERE (c1:Concept) AND (c2:Concept)
MATCH (ctx:Context)
WHERE ctx.uid = rel.context
WITH DISTINCT c1, c2 
MATCH c1-[relall:TO]->c2 
WHERE (relall.user='9d6e7140-f3c3-11e3-927f-1f5ca4210ac7') 
MATCH (ctxname:Context)
WHERE ctxname.uid = relall.context
RETURN DISTINCT 
c1.uid AS source_id, 
c1.name AS source_name, 
c2.uid AS target_id, 
c2.name AS target_name, 
relall.uid AS edge_id, 
ctxname.name AS context_name, 
relall.statement AS statement_id, 
relall.weight AS weight;

How to make this Neo4J Cypher query execute faster?

2 Answers