Need help in reducing neo4j query latency issue with large relationship dataset.
System configuration 8 core, 32 GB VM on cloud
Neo4J configuration page cache - 20 GB heap - 8 GB
ObjectModel Nodes share a relationship "COMMUNICATING_TO" with a relationship property "timestamp".
Query find all the communications between nodes for a given time period, remove duplicate communications between two given nodes.
MATCH (n1)-[r:COMMUNICATING_TO]->(n2)
WHERE r.timestamp >= <fromTimestamp> AND r.timestamp <= <toTimestamp>
RETURN {id:id(n1)} as fromNode, COLLECT(DISTINCT {id:id(n2)}) as toNode
Data 100K nodes, with 500 millions relationships between them.
Challenge For a given day, there are 2 million relationships that exist and the query time is ~50 seconds.
Any suggestions that can help in optimizing the query, system parameters and the object model is appreciated.