1
votes

We're developing an application based on and with about 200k nodes, which every node has a property like type='user' or type='company' to denote a specific entity of our application. We need to get the count of all nodes of a specific type in the graph.

We created an index for every entity like users, companies which holds the nodes of that property. So inside users index resides 130K nodes, and the rest on companies.

With Cypher we quering like this.

START u=node:users('id:*')
RETURN count(u)

And the results are

Returned 1 row.Query took 4080ms

The Server is configured as default with a little tweaks, but 4 sec is too for our needs. Think that the database will grow in 1 month 20K, so we need this query performs very very much.

Is there any other way to do this, maybe with Gremlin, or with some other server plugin? I'll cache those results, but I want to know if is possible to tweak this.

Thanks a lot and sorry for my poor english.

4
Neo4j 1.8.M06, 3GB Ram, SSD 120GBPablo Dominguez
did you try return count(*) and did you measure the first or subsequent queries?Michael Hunger
i did it, and working with * or some field doesn't seem to change the response time. My first run was 8.52 sec. When lucene/neo cached the query the response time drops to 4.08 secPablo Dominguez

4 Answers

3
votes

Finaly, using Gremlin instead of Cypher, I found the solution.

g.getRawGraph().index().forNodes('NAME_OF_USERS_INDEX').query(
    new org.neo4j.index.lucene.QueryContext('*')
).size()

This method uses the lucene index to get "aproximate" rows.

Thanks again to all.

1
votes

Mmh, this is really about the performance of that Lucene index. If you just need this single query most of the time, why not update an integer with the total count on some node somewhere, and maybe update that together with the index insertions, for good measure run an update with the query above every night on it?

0
votes

You could instead keep a property on a specific node up to date with the number of such nodes, where updates are done guarded by write locks:

Transaction tx = db.beginTx();
try {
    ...
    ...
    tx.acquireWriteLock( countingNode );
    countingNode.setProperty( "user_count",
        ((Integer)countingNode.getProperty( "user_count" ))+1 );
    tx.success();
} finally {
    tx.finish();
}
0
votes

If you want the best performance, don't model your entity categories as properties on the node. In stead, do it like this :

company1-[:IS_ENTITY]->companyentity

Or if you are using 2.0

company1:COMPANY

The second would also allow you automatically update your index in a separate background thread by the way, imo one of the best new features of 2.0

The first method should also proof more efficient, since making a "hop" in general takes less time than reading a property from a node. It does however require you to create a separate index for the entities.

Your queries would look like this :

v2.0

MATCH company:COMPANY
RETURN count(company)

v1.9

START entity=node:entityindex(value='company')
MATCH company-[:IS_ENTITIY]->entity
RETURN count(company)