2
votes

I'm experiencing surprisingly slow retrieval of results with ResourceIterator<Node> when I get results from Cypher query execution in Java. next() command takes on average 156ms, with standard deviation of 385! Is this behavior expected, or am I doing something wrong? Can anybody suggest a more efficient way of achieving the same thing?


Graph structure

I have the following graph layout, where Point nodes have LinksTo relations to other points:

Node:Point
Properties:
- idPoint (new style schema unique constraint on this property)
- x (new style schema index on this property)
- y (new style schema index on this property)

Relation:LinksTo
Properties:
- idLink
- length
(...relations don't even play a role in my question...)

Graph statistics:
- # of nodes: 890,000
- # of relations: 910,000


Old code

(Using Neo4j 2.0.0 stable with Oracle Java 7 on Ubuntu)
(Basically this code searches for nodes(points) in a 60x60 square around the given point.)

GraphDatabaseService graphDB = new GraphDatabaseFactory ( ).newEmbeddedDatabase ("points_db");

ExecutionEngine engine = new ExecutionEngine (graphDB);

for (Coordinate c : coords) // coords holds 500 different coordinates
{
    int size = 30;
    int xMin = c.x - size;
    int xMax = c.x + size;
    int yMin = c.y - size;
    int yMax = c.y + size;

    String query = "MATCH (n:POINT) " +
                     "  WHERE n.x > " + xMin +
                     "    AND n.x < " + xMax +
                     "    AND n.y > " + yMin +
                     "    AND n.y < " + yMax +
                     "RETURN n AS neighbour";

    ExecutionResult result = engine.execute (query); // command1

    ResourceIterator<Node> ri = result.columnAs ("neighbour"); // command2

    while (ri.hasNext ( ))
    {
        Node n = ri.next ( ); // command3
        // ... some code ...
    }
}

Measurements

command1 average execution time: 7.5 ms
command2 average execution time: <1 ms
command3 average execution time: 156 ms (with 358 standard deviation)

(Measurements taken with 500 iterations(different coordinates) and on average 6 points are found in each iteration. Measurements are repeatable.)


EDIT 1 (As suggested by Luanne and Michael)

New, faster code with parameterization

(Using Neo4j 2.0.0 stable with Oracle Java 7 on Ubuntu)
(Basically this code searches for nodes(points) in a 60x60 square around the given point.)

GraphDatabaseService graphDB = new GraphDatabaseFactory ( ).newEmbeddedDatabase ("points_db");

ExecutionEngine engine = new ExecutionEngine (graphDB);
Map<String, Object> params = new HashMap<> ( );

int size = 30;
String query = "MATCH (n:POINT) " +
               "  WHERE n.x > {xMin}" +
               "    AND n.x < {xMax}" +
               "    AND n.y > {yMin}" +
               "    AND n.y < {yMax}" +
               "  RETURN n AS neighbour";

for (Coordinate c : coords) // coords holds 500 different coordinates
{
    params.put ("xMin", (int) c.x - size);
    params.put ("xMax", (int) c.x + size);
    params.put ("yMin", (int) c.y - size);
    params.put ("yMax", (int) c.y + size);

    ExecutionResult result = engine.execute (query, params); // command1

    ResourceIterator<Node> ri = result.columnAs ("neighbour"); // command2

    while (ri.hasNext ( ))
    {
        Node n = ri.next ( ); // command3
        // ... some code ...
    }
}

Measurements

command1 average execution time: 1.7 ms
command2 average execution time: <1 ms
command3 average execution time: 112 ms (with 270 standard deviation)
(Measurements taken with 500 iterations(different coordinates) and on average 6 points are found in each iteration. Measurements are repeatable.)

1
Can you first please parameterize your query and then measure time? docs.neo4j.org/chunked/milestone/cypher-parameters.htmlLuanne
You have a typo here: negihbour.Michael Hunger
Good suggestion with parameterization. I improved the code and new measurements are noticeably faster. But I would say they are still in "the same ballpark".user1918305

1 Answers

1
votes

What you're doing is not a graph query but a range scan over the whole database.

So it has to pull in all the nodes and for each of them doe your comparisons.

You usually solve this by putting your nodes into a tree (r-tree) that encodes the geometry into a two-dimensional tree-structure and then you can access whatever shape you need in only log(levels) complexity.

Check out the presentations about Neo4j spatial about this topic:

http://neo4j.org/develop/spatial

You also force Neo4j to re-parse and re-build the query for each of your nodes (500 times). I agree with Luanne on parametrization, so your query should look like this. You should pull this also before the for-loop:

String query = "MATCH (n:POINT) " +
                 "  WHERE n.x > {xMin}" +
                 "    AND n.x < {xMax}" +
                 "    AND n.y > {yMin}" +
                 "    AND n.y < {yMax}" +
                 "  RETURN n AS neighbour";

ExecutionResult result = engine.execute (query,
          map("xMin",xmMin,"xMax",xMax,"yMin",yMin,"yMax",yMax)); // query + params

....