neo4j count nodes performance on 200K nodes and 450K relations

Question

We're developing an application based on neo4j and php with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.

We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.

With Cypher we quering like this.

START u=node:users('id:*')
RETURN count(u)

And the results are

Returned 1 row.Query took 4080ms

The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.

Is there any other way to do this, maybe with Gremlin, or with some other server plugin? I'll cache those results, but I want to know if is possible to tweak this.

Thanks a lot and sorry for my poor english.

did you try return count(*) and did you measure the first or subsequent queries? — Michael Hunger
i did it, and working with * or some field doesn't seem to change the response time. My first run was 8.52 sec. When lucene/neo cached the query the response time drops to 4.08 sec — Pablo Dominguez

Pablo Dominguez Pablo Dominguez · Accepted Answer · 2012-10-24T03:41:48

Finaly, using Gremlin instead of Cypher, I found the solution.

g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
    new org.neo4j.index.lucene.QueryContext('*')
).size()

This method uses the lucene index to get "aproximate" rows.

Thanks again to all.

neo4j count nodes performance on 200K nodes and 450K relations

4 Answers