0
votes

Need help in reducing neo4j query latency issue with large relationship dataset.

System configuration 8 core, 32 GB VM on cloud

Neo4J configuration page cache - 20 GB heap - 8 GB

ObjectModel Nodes share a relationship "COMMUNICATING_TO" with a relationship property "timestamp".

Query find all the communications between nodes for a given time period, remove duplicate communications between two given nodes.

MATCH (n1)-[r:COMMUNICATING_TO]->(n2) 
WHERE r.timestamp >= <fromTimestamp> AND r.timestamp <= <toTimestamp>
RETURN {id:id(n1)} as fromNode, COLLECT(DISTINCT {id:id(n2)}) as toNode

Data 100K nodes, with 500 millions relationships between them.

Challenge For a given day, there are 2 million relationships that exist and the query time is ~50 seconds.

Any suggestions that can help in optimizing the query, system parameters and the object model is appreciated.

1

1 Answers

1
votes

You can use APOC to create a manual index on your relationship.

This is the query to populate your index :

MATCH ()-[r:COMMUNICATING_TO]->()
CALL apoc.index.addRelationship(r,['timestamp'])
RETURN count(*)

An then you can retrieve your relationships like than :

CALL apoc.index.relationships('COMMUNICATING_TO','timestamp:[<fromTimestamp> TO <toTimestamp>]') YIELD rel, start , end
RETURN rel, start, end

Link to the doc : https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_using_manual_index_on_relationship_properties

An other solution is to change your modelisation, by adding the date into your COMMUNICATING_TOrelationship type. Example : 20180205_COMMUNICATING_TO, 20180206_COMMUNICATING_TO, 20180207_COMMUNICATING_TO ...