1
votes

I need an advice about performance improving of social graph. The target query works fine with small results number. But it may return large results with more than 1000 rows. Can the performance be tuned on large respond of cypher query?

The cypher query is used:

START givenFriend=node:Nodes('id:709387498'),
item=node:ItemCat1Cat2('category:a.b')
MATCH p = givenFriend-[:FRIEND]-friend1-[:FRIEND]-friend2-[:DATA]->item
RETURN p, item

Neo4j core 1.9.5

The graph contains connected friends:

friend1Node-[:FRIEND]->friend1Node

A friend can have several data items which are represented as nodes with properties:

friendNode-[:DATA]->DataNode

A data node has about 8 properties. Among them is a category property. The data item nodes are indexed by category.

Friend nodes number: 650,772

Friend relationship number: 842,755

Data item nodes number: 5,640

The query which demands improvement should select all paths from a given node id to data item with defined category through 2 friends. The paths have the following view:

givenFriend-friend1-friend2-dataItem 

Can traversal improve the performance?

Can migration to 2.0.0 improve the db model and query performance?

**UPD

  1. I use php library https://github.com/jadell/neo4jphp But I'm open for other variants. Right now I'm looking at neoism(Golang). Also I considered using neo4j extension to perform a query. The target query is tested through the neo4j dashboard as well. So the client layer was absent.
  2. Fresh version of the php lib is using X-Stream. Mine is not. But as a query was tested without a client then this factor can be omitted.
  3. The question was good. I've tuned the query - it returns not a node but properties which I need and the performance is improved a bit.
  4. If I understand you correctly about SLA - such type of requests should work with concurrency 100 and the allowable respond time 2s per request. The query respond time through the dashboard:

LIMIT 1 = 195ms

LIMIT 100 = 564ms

LIMIT 1000 = 1549ms

LIMIT 3000 = 3208ms

SKIP 7000 LIMIT 1 = 2051ms

The respond can contain up to 13K records.

1

1 Answers

2
votes
  1. What client do you use?
  2. do you use streaming, i.e. X-Stream:true header
  3. Do only return the data you need, so not path or nodes but only those properties you really need to perform your use case.
  4. 2.0.1 would improve performance on the transactional endpoint

What is your SLA and your current response time? How large are the responses?