1
votes

I am using NEO4j 3.5 to store and query relationships between people. I have nodes with the label "User" and relationships with the label of "Friends". I am able to get the friends of friends, but the query is taking too long. It currently returns in 4sec to 6sec. This is not a high transactional neo4j database and the server has plenty of CPU and memory available. The loads on the server are under 3 and there are 8 cores. This is running on an AWS EC2 instance. There are roughly 250,000 nodes in the database and the total database size is under 750mb.

Here is the query that I am currently using:

MATCH (user:User {user_id:1145})-[:FRIENDS*3]->(fof:User)
WHERE NOT (user:User)-[:FRIENDS]->(fof:User)
RETURN count(distinct fof.user_id)

This cypher query returns a count of 69,704, which is correct.

What optimizations can be made either to the cypher query or the NEO4j database engine to return the results faster?

Execution Plan

+-----------------------+----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| Operator              | Estimated Rows | Rows   | DB Hits | Cache H/M | Identifiers                 | Ordered by       | Other                                      |
+-----------------------+----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +ProduceResults       |              0 |      1 |       0 |       0/0 | count(distinct fof.user_id) |                  | 0.0                                        |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +EagerAggregation     |              0 |      1 |  326421 |       0/0 | count(distinct fof.user_id) |                  | 0.0                                        |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +AntiSemiApply        |              0 | 256717 |       0 |       0/0 | anon[33], fof, user         | user.user_id ASC | 0.0                                        |
| |\                    +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| | +Expand(Into)       |              0 |      0 | 8006149 |       0/0 |   REL80, fof, user          |                  | 0.0; (user)-[  REL80:FRIENDS]->(fof)       |
| | |                   +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| | +Filter             |              1 | 260120 |  520240 |       0/0 | fof, user                   |                  | 0.0; fof:User                              |
| | |                   +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| | +Argument           |              1 | 260120 |       0 |       0/0 | fof, user                   |                  | 0.0                                        |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +Filter               |              0 | 260120 |  260120 |       0/0 | anon[33], fof, user         | user.user_id ASC | 0.0; fof:User                              |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +VarLengthExpand(All) |              0 | 260120 |  267999 |       0/0 | anon[33], fof, user         | user.user_id ASC | 0.0; (user)-[anon[33]:FRIENDS*3..3]->(fof) |
| |                     +----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
| +NodeIndexSeek        |              1 |      1 |       3 |       0/0 | user                        | user.user_id ASC | 0.0; :User(user_id)                        |
+-----------------------+----------------+--------+---------+-----------+-----------------------------+------------------+--------------------------------------------+
1
Keep in mind also that :FRIENDS*3 is not friends-of-friends, but friends-of-friends-of-friends.InverseFalcon

1 Answers

2
votes
  1. Your WHERE clause contains a pattern that requires additional DB hits per fof. You can avoid those DB hits by keeping in memory a list of all the immediate friends of user, and changing your WHERE clause so that it just searches in the list. (According to your profile data, this could save 8006149+520240, or over 8.5 million DB hits -- which is the majority of the hits for your whole query.)

  2. In your query, if the same fof node is matched multiple times, the same WHERE test will be performed every time. You can avoid that by filtering out duplicate fof nodes before doing the WHERE test. This also means you no longer need to remove duplicates later.

For example:

MATCH (user:User {user_id:1145})-[:FRIENDS]->(f:User)
WITH user, COLLECT(f) AS friends
MATCH (user)-[:FRIENDS*3]->(fof:User)
WITH DISTINCT friends, fof
WHERE NOT fof IN friends
RETURN COUNT(fof)