Neo4J query poor performance

Question

I'm performing a "stress test" with NEO4J database. It isn't a big deal, but the partial results make me wonder whether this technology is suitable for online applications (or simply I don't get Cypher).

The first test was to add node by node like

(1° node) -[:NEXT_FRAME]-> () -[:NEXT_FRAME]-> () -[:NEXT_FRAME]-> () -[:NEXT_FRAME]-> ... -[:NEXT_FRAME]-> (last node)

and then retrieve the entire path using this query

START n=node:Frame(node_id="0"), m=node:Frame(node_id="9000")
MATCH p=(n)-[:FRAME_NEXT*]->(m)
RETURN p
ORDER BY m.node_id DESC
LIMIT 1

Note, when m.node_id == 2, the query takes ~100 ms. Now with ~9000 nodes, it can take up to 30 seconds. I am not an expert, but it is too much time! I don't think 9K nodes should make this much difference.

So, what am I missing?

Cheers (and Merry Xmas)

Edited:

I'm using py2neo and timing the query this way:

    q_str = """
    START n=node:Frame(node_id="0"), m=node:Frame(node_id="%d")
    MATCH p=(n)-[:FRAME_NEXT*]->(m)
    RETURN p
    ORDER BY m.node_id DESC
    LIMIT 1
    """ % (i,)
    print q_str

    before = datetime.datetime.now()
    query = neo4j.CypherQuery(graph_db, q_str)
    record, = query.execute().data
    after = datetime.datetime.now()
    diff = after - before
    diff_ms = diff.total_seconds() *1000
    print 'Query took %.2f ms' % (diff_ms)

Stefan Armbruster Stefan Armbruster · Accepted Answer · 2013-12-26T11:28:29

The query tries to identify each and every path between n and m, which might be a huge number depending on the shape of your graph.

Try to avoid ORDER BY in such situations. The following might be faster since only one single path needs to be identified:

START n=node:Frame(node_id="0"), m=node:Frame(node_id="9000")
MATCH p=(n)-[:FRAME_NEXT*]->(m)
RETURN p
LIMIT 1

If you're looking for pure performance, you'll be better off using traversal API or graph algorithms directly. This requires some Java coding.

Neo4J query poor performance

3 Answers