0
votes

I'm writing a template for a query that returns a list of test scores with relevant information. With a sample dataset on Neo4j Community they are taking a long time.

Here's an example,

// Marks that were ranked top 10 on that test, and performed during a 
section between 2015-1-1 and 2016-02-07

MATCH (mark:Mark)-[r1:PERFORMED_BY]->(prsn:Person)
MATCH (mark:Mark)-[r2:PERFORMED_ON]->(test:Test)
MATCH (mark:Mark)-[r3:PERFORMED_FOR]->(course:Course)
MATCH (mark:Mark)-[r4:PERFORMED_DURING]->(sect:Section)
MATCH (s:Section)-[r5:LOCATED_IN]->(room:Room)

WHERE r2.rank in range(1,10) AND sect.datetime in range(1420099200000,1494831600000,100000)

RETURN mark.uid, prsn.uid, test.uid, course.uid, sect.uid, mark.score, course.datetime, prsn.name, course.title, room.number
r1.class, r2.rank, r3.rank

ORDER BY mark.score

The simplest of queries WHERE r2.rank = 1 can take a a few seconds. When using the range operator it will take 30+ seconds. Are there any strategies in which can I can tune the query?

Neo4j Community 3.1.1

Store info

  • Array Store 8.00 KiB
  • Logical Log 17.05 MiB
  • Node Store 143.96 KiB
  • Property Store 1.67 MiB
  • Relationship Store 1.28 MiB
  • String Store Size 72.00 KiB
  • Total Store Size 29.54 MiB

Node id info

  • Node ID 9463
  • Property ID 42673
  • Relationship ID 39466
  • Relationship Type ID 12
1

1 Answers

2
votes

It helps to match on the most relevant data first, since smaller datasets will be easier and faster to filter with subsequent MATCH operations. Once you've filtered down to the relevant nodes, THEN match on the rest of the nodes you'll need for your return.

Also, you'll want to make sure you have an index on :Section(datetime) for fast lookups.

Try this one:

MATCH (mark:Mark)-[r4:PERFORMED_DURING]->(sect:Section)
// faster to do an indexed range query like this
WHERE 1420099200000 <= sect.datetime <= 1494831600000
MATCH (mark)-[r2:PERFORMED_ON]->(test:Test)
WHERE 1 <= r2.rank <= 10
// now you have all relevant marks, match on the rest of the nodes you need
MATCH (mark)-[r1:PERFORMED_BY]->(prsn:Person)
MATCH (mark)-[r3:PERFORMED_FOR]->(course:Course)
MATCH (sect)-[r5:LOCATED_IN]->(room:Room)

RETURN mark.uid, prsn.uid, test.uid, course.uid, sect.uid, mark.score, course.datetime, prsn.name, course.title, room.number
r1.class, r2.rank, r3.rank

ORDER BY mark.score

Also, it's always a good idea to PROFILE your query when tuning to figure out the problem areas.

Oh, and another reason this was blowing up, you had performed a match to a :Section sect, but the following match didn't use the sect variable, so the match was finding all sections s in all rooms, which wasn't relevant to the rest of your query.