2
votes

I have a huge graphdatabase with authors, which are connected to papers and papers a connected to nodes which contains meta information of the paper. I tried to select authors which match a specific pattern and therefore I executed the following cypher statement in java.

String query = "MATCH (n:AUTHOR) WHERE n.name =~ '(?i).*jim.*' RETURN n";
db.execute(query);

I get a resultSet with all "authors" back. But the execution is very slow. Is it, because Neo4j writes the result into the memory?

If I try to find nodes with the Java API, it is much faster. Of course, I am only able to search for the exact name like the following code example, but it is about 4 seconds faster as the query above. I tested it on a small database with about 50 nodes, whereby only 6 of the nodes are authors. The six author are also in the index.

db.findNodes(NodeLabel.AUTHOR, NodeProperties.NAME, "jim knopf" );

Is there a chance to speed up the cypher? Or a possiblity to get all nodes via Java API and the findNodes() method, which match a given pattern?

Just for information, I created the index for the name of the author in java with graph.schema().indexFor(NodeLabel.AUTHOR).on("name").create();

Perhaps somebody could help. Thanks in advance.

EDIT:

I run some tests today. If I execute the query PROFILE MATCH (n:AUTHOR) WHERE n.name = 'jim seroka' RETURN n; in the browser interface, I have only the operator NodeByLabelScan. It seems to me, that Neo4j does not automatic use the index (Index for name is online). If I use a the specific index, and execute the query PROFILE MATCH (n:AUTHOR) USING INDEX n:AUTHOR(name) WHERE n.name = 'jim seroka' RETURN n; the index will be used. Normally Neo4j should use automatically the correct index. Is there any configuration to set?

I also did some testing in the embedded mode again, to check the performance of the query in the embedded mode. I tried to select the author "jim seroka" with db.findNode(NodeLabel.AUTHOR, "name", "jim seroka");. It works, and it seems to me that the index is used, because of a execution time of ~0,05 seconds.

But if I run the same query, as I executed in the interface and mentioned before, using a specific index, it takes ~4,9 seconds. Why? I'm a little bit helpless. The database is local and there are only 6 authors. Is the connector slow or is the creation of connection wrong? OK, findNode() does return just a node and execute a whole Result, but four seconds difference?

The following source code should show how the database will be created and the query is executed.

public static GraphDatabaseService getNeo4jDB() {
    ....
    return new GraphDatabaseFactory().newEmbeddedDatabase(STORE_DIR);
}

private Result findAuthorNode(String searchValue) {
    db = getNeo4jDB();

    String query = "MATCH (n:AUTHOR) USING INDEX n:AUTHOR(name) WHERE n.name = 'jim seroka' RETURN n";

    return db.execute(query);
}
1

1 Answers

2
votes

Your query uses a regular expression and therefore is not able to use an index:

MATCH (n:AUTHOR) WHERE n.name =~ '(?i).*jim.*' RETURN n

Neo4j 2.3 introduced index supported STARTS WITH string operator so this query would be very performant:

MATCH (n:Author) WHERE n.name STARTS WITH 'jim' RETURN n

Not quite the same as the regular expression, but will have better performance.