0
votes

I created a Needham time tree (http://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-a-time-tree-down-to-the-day/) going down to the hour. On each hour node I stored the epoch time

CREATE (h:Hour {hour: hour, day:day, month: month, year:year, time: apoc.date.parse(year +"-"+month+"-"+day+" "+hour+":00", "s", "yyyy-MM-dd hh:mm")})

Now I want to link events that happened within that hour to the hour node. I did this with the following query:

//Create one example event node
create (e:Event {time: apoc.date.parse("2017-11-17 13:15", "s", "yyyy-mm-dd HH:mm")})
with e
match (h:Hour) where h.time <= e.time and (h.time+3600) > e.time
merge (e)-[:IN_HOUR]->(h)

I indexed both the event time and hour time.

This works well for small groups of events but when I scale the events up to the hundreds of thousands range it goes VERY slowly. (On the order of a few hundred to 1000 relationships per hour)

How can I do this faster?

I tried both

match (e:Event) ...`

and using load CSV to iterate over each event, match it to an existing event node and then create a relationship to the time tree.

`

2

2 Answers

1
votes

[EDITED]

First, create an index on :Hour(time):

CREATE INDEX ON :Hour(time);

Then, change your query to the following (assuming that you pass the event time as a event_time parameter:

CREATE (e:Event {time: apoc.date.parse($event_time, "s", "yyyy-MM-dd HH:mm")})
WITH e
MATCH (h:Hour {time: e.time/3600*3600})
USING INDEX h:Hour(timer)
MERGE (e)-[:IN_HOUR]->(h);

The expression e.time/3600*3600 rounds e.time down to the nearest hour (but is still in seconds units). Since :Hour(time) is indexed, the MATCH should be be fast.

NOTE: The USING INDEX clause is there to give the Cypher Planner a hint that it should take advantage of the index. The planner does not aleways do that. If you the PROFILE your queries, you can see if giving a hint is necessary.

You query cannot make use of the index, since its WHERE clause is too complex.

0
votes

Here is the solution I landed on: thanks to cybersam I realized my where clause was too complicated for the indexing to work. Unfortunately cybersam's approach was just as slow. I removed the calculation from the match by updating the (:Event) nodes by adding an timeHour property (and indexing it)

match (e:Event) set e.timeHour = e.time/3600*3600

Then I was able to join a few hundred thousand relationship in 3 seconds by doing:

match (e:Event)
match (h:Hour) where h.time = e.timeHour
merge (e)-[:IN_HOUR]->(h)