0
votes

We did a prototype project to evaluate whether neo4j is applicable for our scenario. However, the performance is not as well as expected. So we would like to know whether it's Neo4j limitation or how should we improve it.

The scenario detail is:

  1. System config: CPU Intel-i5 2.3G dual-core, Memory:16G, Neo4j version: 2.3.1
  2. Only one kind of Node label and node number in the graphdb is 50. Each node has about 20 properties. Index is created on major property "nodeid";
  3. Relation type is 80, but total relationship number in graphdb is about 6000 (as the test nodes only have minor difference between each other), only one property to indicate version, so no index;
  4. The prototype target is: start from a specific node, to dig out all nodes with specific relationships (about 50) as a net.

The Cypher query through browser costs more than 10 seconds. As there is only 50 nodes, is this as expected? Below is the cypher query command we used:

MATCH
  (startnode:MYNode {nodeid:"123456"})-[r1:REL1|:REL2......|:REL50 {version:1}]-
  (target1:MYNode)-[r2:REL1|:REL2......|:REL50 {version:1}]-
  (target2:MYNode)-[r3:REL1|:REL2......|:REL50 {version:1}]-(target3:MYNode)
WHERE target1.timestamp > 1449417600 AND  target2.timestamp > 1449417600 AND target3.timestamp > 1449417600 
RETURN
  DISTINCT target2.nodeid as l_id,
  target2.timestamp as l_ts,
  type(r3) as r_type,
  target3.nodeid as r_id,
  target3.timestamp as r_ts 
LIMIT 5000;

Below is profile output:

Compiler CYPHER 2.3

Planner COST

Runtime INTERPRETED


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Operator        | Estimated Rows | Rows  | DB Hits | Identifiers                                 | Other
                                                                                                                                                                                         |

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +ProduceResults |             93 |  5000 |       0 | l_appts, l_id, r_appts, r_id, r_type        | l_id, l_appts, r_type, r_id, r_appts
                                                                                                                                                                                         |
| |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Limit          |             93 |  5000 |       0 | l_appts, l_id, r_appts, r_id, r_type        | Literal(5000)
                                                                                                                                                                                         |
| |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Distinct       |             93 |  5000 |  627064 | l_appts, l_id, r_appts, r_id, r_type        | r_id, l_appts, r_appts, l_id, r_type
                                                                                                                                                                                         |
| |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Filter         |             98 | 78383 |  313881 | r1, r2, r3, seed, target1, target2, target3 | Ands(r3.version == {  AUTOINT3}, NOT(r2 == r3), NOT(r1 == r3), AndedPropertyComparablePredicates(    target3,target3.STAN_APP_TIMESTAMP,target3.STAN_APP_TIMESTAMP < {  AUTOINT9}, target3.STAN_APP_TIMESTAMP > {  AUTOINT8}), target3:APPNode)
                                                                                                                                                                                         |
| |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Expand(All)    |            977 | 78732 |   96630 | r1, r2, r3, seed, target1, target2, target3 | (target2)-[r3:ID__ID|:ID__ID_C1|:ID__ID_C2|:ID_C1__ID_C1|:ID_C1__ID_C2|:ID_C1__ID|:ID_C2__ID_C1|:ID_C2__    ID_C2|:ID_C2__ID|:CELLPHONE__CELLPHONE|:CELLPHONE__HOMEPHONE|:CELLPHONE__EMPPHONE|:CELLPHONE__C1PHONE|:CELLPHONE__C2PHONE|:HOMEPHONE__CELLPHONE|:HOMEPHONE__HOMEPHONE|:HOMEPHONE__EMPPHONE|:HOMEPHONE__C1PHON    E|:HOMEPHONE__C2PHONE|:EMPPHONE__CELLPHONE|:EMPPHONE__HOMEPHONE|:EMPPHONE__EMPPHONE|:EMPPHONE__C1PHONE|:EMPPHONE__C2PHONE|:C1PHONE__CELLPHONE|:C1PHONE__HOMEPHONE|:C1PHONE__EMPPHONE|:C1PHONE__C1PHONE|:C1PHO    NE__C2PHONE|:C2PHONE__CELLPHONE|:C2PHONE__HOMEPHONE|:C2PHONE__EMPPHONE|:C2PHONE__C1PHONE|:C2PHONE__C2PHONE|:EMAIL__EMAIL|:CARLICENSE__CARLICENSE|:EMPNAME__EMPNAME|:IPADDR__IPADDR|:MACADDR__MACADDR|:WIFIMAC    __WIFIMAC|:HOMEADDR__HOMEADDR__P0|:EMPADDR__EMPADDR__P0|:HOMEADDR__EMPADDR__P0|:EMPADDR__HOMEADDR__P0|:C1ADDR__C1ADDR__P0|:C2ADDR__C2ADDR__P0|:C1ADDR__C2ADDR__P0|:C2ADDR__C1ADDR__P
0|:HOMEADDR__C1ADDR__P0|:HOMEADDR__C2ADDR__P0|:EMPADDR__C1ADDR__P0|:EMPADDR__C2ADDR__P0|:C1ADDR__HOMEADDR__P0|:C1ADDR__EMPADDR__P0|:C2ADDR__HOMEADDR__P0|:C2ADDR__EMPADDR__P0]-(target3) |
| |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Filter         |             21 |   314 |    1260 | r1, r2, seed, target1, target2              | Ands(AndedPropertyComparablePredicates(target2,target2.STAN_APP_TIMESTAMP,target2.STAN_APP_TIMESTAMP <     {  AUTOINT7}, target2.STAN_APP_TIMESTAMP > {  AUTOINT6}), target2:APPNode, r2.version == {  AUTOINT2}, NOT(r1 == r2))
                                                                                                                                                                                         |
| |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Expand(All)    |            212 |   315 |     429 | r1, r2, seed, target1, target2              | (target1)-[r2:ID__ID|:ID__ID_C1|:ID__ID_C2|:ID_C1__ID_C1|:ID_C1__ID_C2|:ID_C1__ID|:ID_C2__ID_C1|:ID_C2__    ID_C2|:ID_C2__ID|:CELLPHONE__CELLPHONE|:CELLPHONE__HOMEPHONE|:CELLPHONE__EMPPHONE|:CELLPHONE__C1PHONE|:CELLPHONE__C2PHONE|:HOMEPHONE__CELLPHONE|:HOMEPHONE__HOMEPHONE|:HOMEPHONE__EMPPHONE|:HOMEPHONE__C1PHON    E|:HOMEPHONE__C2PHONE|:EMPPHONE__CELLPHONE|:EMPPHONE__HOMEPHONE|:EMPPHONE__EMPPHONE|:EMPPHONE__C1PHONE|:EMPPHONE__C2PHONE|:C1PHONE__CELLPHONE|:C1PHONE__HOMEPHONE|:C1PHONE__EMPPHONE|:C1PHONE__C1PHONE|:C1PHO    NE__C2PHONE|:C2PHONE__CELLPHONE|:C2PHONE__HOMEPHONE|:C2PHONE__EMPPHONE|:C2PHONE__C1PHONE|:C2PHONE__C2PHONE|:EMAIL__EMAIL|:CARLICENSE__CARLICENSE|:EMPNAME__EMPNAME|:IPADDR__IPADDR|:MACADDR__MACADDR|:WIFIMAC    __WIFIMAC|:HOMEADDR__HOMEADDR__P0|:EMPADDR__EMPADDR__P0|:HOMEADDR__EMPADDR__P0|:EMPADDR__HOMEADDR__P0|:C1ADDR__C1ADDR__P0|:C2ADDR__C2ADDR__P0|:C1ADDR__C2ADDR__P0|:C2ADDR__C1ADDR__P
0|:HOMEADDR__C1ADDR__P0|:HOMEADDR__C2ADDR__P0|:EMPADDR__C1ADDR__P0|:EMPADDR__C2ADDR__P0|:C1ADDR__HOMEADDR__P0|:C1ADDR__EMPADDR__P0|:C2ADDR__HOMEADDR__P0|:C2ADDR__EMPADDR__P0]-(target2) |
| |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Filter         |              5 |     2 |       8 | r1, seed, target1                           | Ands(r1.version == {  AUTOINT1}, target1:APPNode, AndedPropertyComparablePredicates(target1,target1.    STAN_APP_TIMESTAMP,target1.STAN_APP_TIMESTAMP > {  AUTOINT4}, target1.STAN_APP_TIMESTAMP < {  AUTOINT5}))
                                                                                                                                                                                         |
| |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +Expand(All)    |             46 |     2 |      59 | r1, seed, target1                           | (seed)-[r1:ID__ID|:ID__ID_C1|:ID__ID_C2|:ID_C1__ID_C1|:ID_C1__ID_C2|:ID_C1__ID|:ID_C2__ID_C1|:ID_C2__ID_    C2|:ID_C2__ID|:CELLPHONE__CELLPHONE|:CELLPHONE__HOMEPHONE|:CELLPHONE__EMPPHONE|:CELLPHONE__C1PHONE|:CELLPHONE__C2PHONE|:HOMEPHONE__CELLPHONE|:HOMEPHONE__HOMEPHONE|:HOMEPHONE__EMPPHONE|:HOMEPHONE__C1PHONE|:    HOMEPHONE__C2PHONE|:EMPPHONE__CELLPHONE|:EMPPHONE__HOMEPHONE|:EMPPHONE__EMPPHONE|:EMPPHONE__C1PHONE|:EMPPHONE__C2PHONE|:C1PHONE__CELLPHONE|:C1PHONE__HOMEPHONE|:C1PHONE__EMPPHONE|:C1PHONE__C1PHONE|:C1PHONE_    _C2PHONE|:C2PHONE__CELLPHONE|:C2PHONE__HOMEPHONE|:C2PHONE__EMPPHONE|:C2PHONE__C1PHONE|:C2PHONE__C2PHONE|:EMAIL__EMAIL|:CARLICENSE__CARLICENSE|:EMPNAME__EMPNAME|:IPADDR__IPADDR|:MACADDR__MACADDR|:WIFIMAC__W    IFIMAC|:HOMEADDR__HOMEADDR__P0|:EMPADDR__EMPADDR__P0|:HOMEADDR__EMPADDR__P0|:EMPADDR__HOMEADDR__P0|:C1ADDR__C1ADDR__P0|:C2ADDR__C2ADDR__P0|:C1ADDR__C2ADDR__P0|:C2ADDR__C1ADDR__P0|:
HOMEADDR__C1ADDR__P0|:HOMEADDR__C2ADDR__P0|:EMPADDR__C1ADDR__P0|:EMPADDR__C2ADDR__P0|:C1ADDR__HOMEADDR__P0|:C1ADDR__EMPADDR__P0|:C2ADDR__HOMEADDR__P0|:C2ADDR__EMPADDR__P0]-(target1)    |
| |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| +NodeIndexSeek  |              1 |     1 |       2 | seed                                        | :APPNode(APP_ID)
                                                                                                                                                                                         |
+-----------------+----------------+-------+---------+---------------------------------------------+---------------------------------------------------------------------------------------------------------    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Total database accesses: 1039333
1
Can you share a PROFILE output and the database (zipped)?Michael Hunger
Hi Michael, I've added the profile output in my original questions. @MichaelHungerfehu
Could you explain a bit more about your domain? It is unusual to match against such a high number of rel-types at once. Usually you'd just leave them off and post-filter if needed.Michael Hunger
Have you tried path=(startnode:MYNode {nodeid:"123456"})-[rels:REL1|:REL2......|:REL50*2..2 {version:1}]- (target1:MYNode)? It's more compact, you can pull out the nodes from the path with nodes(path), get the rels with the rels variable, and you can use EXTRACT / UNWIND to work with the collectionsBrian Underwood

1 Answers

2
votes

I tried to recreate your database and statement. The problem is that you span up a huge lattice, and you're post-filtering with distinct.

If you reduce the WIP in between (as you are interested in the distinct results anyway) using WITH DISTINCT, then it completes in a few ms for me.

MATCH (startnode:MYNode {nodeid:1})
-[r1:REL1|:REL2|:REL3|:REL4|:REL5|:REL6|:REL7|:REL8|:REL9|:REL10|:REL11|:REL12|:REL13|:REL14|:REL15|:REL16|:REL17|:REL18|:REL19|:REL20|:REL21|:REL22|:REL23|:REL24|:REL25|:REL26|:REL27|:REL28|:REL29|:REL30|:REL31|:REL32|:REL33|:REL34|:REL35|:REL36|:REL37|:REL38|:REL39|:REL40|:REL41|:REL42|:REL43|:REL44|:REL45|:REL46|:REL47|:REL48|:REL49|:REL50 {version:1}]-(target1:MYNode)
WHERE target1.timestamp > 1449417600 WITH distinct target1

MATCH (target1)-[r2:REL1|:REL2|:REL3|:REL4|:REL5|:REL6|:REL7|:REL8|:REL9|:REL10|:REL11|:REL12|:REL13|:REL14|:REL15|:REL16|:REL17|:REL18|:REL19|:REL20|:REL21|:REL22|:REL23|:REL24|:REL25|:REL26|:REL27|:REL28|:REL29|:REL30|:REL31|:REL32|:REL33|:REL34|:REL35|:REL36|:REL37|:REL38|:REL39|:REL40|:REL41|:REL42|:REL43|:REL44|:REL45|:REL46|:REL47|:REL48|:REL49|:REL50 {version:1}]-(target2:MYNode)
WHERE  target2.timestamp > 1449417600 WITH distinct target2

MATCH (target2)-[r3:REL1|:REL2|:REL3|:REL4|:REL5|:REL6|:REL7|:REL8|:REL9|:REL10|:REL11|:REL12|:REL13|:REL14|:REL15|:REL16|:REL17|:REL18|:REL19|:REL20|:REL21|:REL22|:REL23|:REL24|:REL25|:REL26|:REL27|:REL28|:REL29|:REL30|:REL31|:REL32|:REL33|:REL34|:REL35|:REL36|:REL37|:REL38|:REL39|:REL40|:REL41|:REL42|:REL43|:REL44|:REL45|:REL46|:REL47|:REL48|:REL49|:REL50 {version:1}]-(target3:MYNode) 
WHERE  target3.timestamp > 1449417600 
RETURN target2.nodeid as l_id, target2.timestamp as l_ts, type(r3) as r_type, target3.nodeid as r_id, target3.timestamp as r_ts 
LIMIT 5000;