1
votes

I'm performing a "stress test" with NEO4J database. It isn't a big deal, but the partial results make me wonder whether this technology is suitable for online applications (or simply I don't get Cypher).

The first test was to add node by node like

(1° node) -[:NEXT_FRAME]-> () -[:NEXT_FRAME]-> () -[:NEXT_FRAME]-> () -[:NEXT_FRAME]-> ... -[:NEXT_FRAME]-> (last node)

and then retrieve the entire path using this query

START n=node:Frame(node_id="0"), m=node:Frame(node_id="9000")
MATCH p=(n)-[:FRAME_NEXT*]->(m)
RETURN p
ORDER BY m.node_id DESC
LIMIT 1

Note, when m.node_id == 2, the query takes ~100 ms. Now with ~9000 nodes, it can take up to 30 seconds. I am not an expert, but it is too much time! I don't think 9K nodes should make this much difference.

So, what am I missing?

Cheers (and Merry Xmas)

Edited:

I'm using py2neo and timing the query this way:

    q_str = """
    START n=node:Frame(node_id="0"), m=node:Frame(node_id="%d")
    MATCH p=(n)-[:FRAME_NEXT*]->(m)
    RETURN p
    ORDER BY m.node_id DESC
    LIMIT 1
    """ % (i,)
    print q_str

    before = datetime.datetime.now()
    query = neo4j.CypherQuery(graph_db, q_str)
    record, = query.execute().data
    after = datetime.datetime.now()
    diff = after - before
    diff_ms = diff.total_seconds() *1000
    print 'Query took %.2f ms' % (diff_ms)
3

3 Answers

2
votes

The query tries to identify each and every path between n and m, which might be a huge number depending on the shape of your graph.

Try to avoid ORDER BY in such situations. The following might be faster since only one single path needs to be identified:

START n=node:Frame(node_id="0"), m=node:Frame(node_id="9000")
MATCH p=(n)-[:FRAME_NEXT*]->(m)
RETURN p
LIMIT 1

If you're looking for pure performance, you'll be better off using traversal API or graph algorithms directly. This requires some Java coding.

2
votes

This is a shortcoming of Cypher, it doesn't handle long variable length paths well right now.

Could you try MATCH p=shortestPath((n)-[:FRAME_NEXT*]->(m))?

Also if you can, can you try Neo4j 2.0 and instead of using the legacy index, using labels and the new indexes. According to the query plan, there the faster bidirectional-traversal-matcher should be used.

MATCH (n: Frame {node_id:"0"})-[:FRAME_NEXT*]->(m:Frame {node_id:"9000"})
RETURN p

Alternatively will be probably better off with a REST traverser: http://docs.neo4j.org/chunked/milestone/rest-api-traverse.html

or a REST-graph-algo: http://docs.neo4j.org/chunked/milestone/rest-api-graph-algos.html

1
votes

If there is only one path you can drop ODER BY and LIMIT. Also try using the shortestPath function, it can be much faster even if there is only one matching path in your graph (i.e. even if all paths are the shortest). Finally if you know how deep the variable depth relationship is, declare that in your pattern, and if you know only roughly how deep then specify a range. Try some combinations of these for comparison, you can profile them in the neo4j-shell and look at the execution plan.

MATCH p=(n)-[:FRAME_NEXT*9000]->(m)
MATCH p=(n)-[:FRAME_NEXT*8900..9100]->(m)
MATCH p=shortestPath( (n)-[:FRAME_NEXT*]->(m) )
MATCH p=shortestPath( (n)-[:FRAME_NEXT*8900..9100]->(m) )