0
votes

I have a simple cypher query which is taking too long to complete.

Node Types

  • Person {PersonId, PersonEmailAddress}

  • Document {DocumentId}

Relationship Types: SENT, TO, CC, BCC

The concept is:

(p:Person)-[:SENT]->(d:Document)-[:TO|CC|BCC]->(anotherPerson:Person)

I am trying to get those person to whom any given person has sent maximum mails.

For example you can see in the following diagram that Person#1 has sent maximum mails (6 mails) to Person#77 and Person#615 and so on. So here I want (as mentioned in the below table) top 5 Person ids with count of number of mail sent to that person.

+------------+-----------------------+
| ReceiverId | NumberOfMailsReceived |
+------------+-----------------------+
|         77 |                     6 |
|        615 |                     6 |
|        101 |                     4 |
|        247 |                     4 |
|        252 |                     4 |
+------------+-----------------------+

enter image description here

I'm trying the following query:

MATCH(p:Person{PersonId:1})-->(d) WITH DISTINCT d
MATCH (d)-->(rc)
RETURN rc, COUNT(rc) as c ORDER BY c DESC LIMIT 5

Here there is no performance issue with this query as there is only 9 documents the Person#1 has sent and only 15 person are involved as Receivers of this 9 documents.

But If I fire the same query for a different Person who has sent total nearly 56,500 documents, and total 869 (distinct) person are involved as Receivers, the query takes 43261 ms to complete

Cypher version: CYPHER 3.1, planner: COST, runtime: INTERPRETED. 21570218 total db hits in 43261 ms.

The PROFILE is as below

enter image description here

My neo4j browser shows Size: 5.16 GiB under Database section.

This are the configs I'm using:

dbms.memory.heap.initial_size=8G
dbms.memory.heap.max_size=8G
dbms.memory.pagecache.size=4g

Any suggestions or ideas to optimize the query ?

Thanks in advance.

EDIT

After updating Neo4j version from 3.1.1 to 3.2, the query still takes approx 23368 ms.

2

2 Answers

1
votes

I don't think you can do anything about your raw DBHITS as your query is already pretty minimal (the only wasted effort is that DISTINCT doesn't filter anything in the example case); But you CAN try and parallel your query a bit like this

MATCH (p:Person{PersonId:1})-->(d)
MATCH (d)-[:TO]->(rc)
WITH d, COLLECT(rc) AS list
MATCH (d)-[:CC]->(rc)
WITH d, list + COLLECT(rc) AS list
MATCH (d)-[:BCC]->(rc)
WITH d, list + COLLECT(rc) AS list
UNWIND list AS rc
RETURN rc, COUNT(rc) as c 
ORDER BY c DESC 
LIMIT 5

(You might need a to use "USING JOIN ON d" somewhere to make Cypher break up the work in parallel)

0
votes

You should be able to just traverse from the origin Person to the recipients through the documents rather than crunching a distinct on the documents as an intermediary step.

How does this query hold up against your data set?

MATCH(p:Person {PersonId:1})
WITH p
MATCH (p)-->(:Document)-->(rc:Person)
RETURN rc, COUNT(rc) AS c 
ORDER BY c DESC 
LIMIT 5