I'm working on a Cypher that returns a "combined limit" on two set of results, one is immediate neighbors, the other is neighbors cross "event nodes", as following:
OPTIONAL MATCH (subject:Person {age:"38"})--(event:Event)--(targetViaEvent)
OPTIONAL MATCH (subject)--(directTarget)
WHERE NOT directTarget:Event
WITH subject, targetViaEvent, directTarget,
COUNT(event) AS eventCount
ORDER BY eventCount DESC
WITH subject, COLLECT(directTarget) + COLLECT(targetViaEvent) as targetList
UNWIND targetList AS target
WITH DISTINCT subject, target
SKIP 0 LIMIT 10
...
The main purpose of this Cypher query is:
- Find all the neighbors
- If a neighbor is labeled
Event
, find the other neighbors of the event - Sort the event-connected-neighbor by the amount of events
- Return neighbors found above, whether labeled
Event
or not, use skip and limit for pagination 4.1. If capable, return neighbors withEvent
label ahead over the ones without
Other specifications:
- All relationship types and directions are taking account, so these are not filtered
With COLLECT()
used, the execution time gets unbelievably slow, making neo4j shell stall, as each subject may have ten thousands of directTarget
and targetViaEvent
. I suspect COLLECT()
caches every matched node object in memory, thus jams Neo4j in this data scale. My intention is just to combine the two, and do limitation altogether. Is there any tricks to improve my Cypher?
EDIT:
As @InverseFalcon pointed out my mistake in my Cypher above, here's my entire Cypher with updates:
PROFILE MATCH (subject:Person {age:"38"})
OPTIONAL MATCH (subject)--(directTarget)
WHERE NOT directTarget:Event
OPTIONAL MATCH (subject)--(event:Event)--(targetViaEvent)
WITH subject, targetViaEvent, directTarget,
COUNT(event) AS eventCount ORDER BY eventCount DESC
WITH subject, COLLECT(directTarget) + COLLECT(targetViaEvent) as targetList
UNWIND targetList AS target
WITH DISTINCT subject, target SKIP 0 LIMIT 300 WHERE target IS NOT NULL
OPTIONAL MATCH (subject)-[subject_target]-(target)
OPTIONAL MATCH (subject)--(eventPrime)--(target)
WITH subject, subject_target, target, COLLECT(eventPrime)[0..200] AS eventList
UNWIND (CASE eventList WHEN [] THEN [null] else eventList end) as limitedEvents
OPTIONAL MATCH (subject)-[subject_event]-(limitedEvents)-[event_target]-(target)
RETURN subject, subject_target, target, subject_event, limitedEvents, event_target
Note: after the SKIP...LIMIT...
I repeat the query only to identify the relationships between the nodes, in the sense that a) I'd like to have relationships in the json result; b) after quite a few attempts I can't manage to fetch relationships along with the first 3 MATCH
s, specifically COUNT(event)
doesn't work because each event is bidden with a relationship so that the count is constantly 1.