0
votes

In Neo4j, is it faster to run a query against all nodes (AllNodesScan) and then filter on their labels with a WHERE clause, or to run multiple queries with a NodeByLabelScan?


To illustrate, I want all nodes that are labeled with one of the labels in label_list:

label_list = ['label_1', 'label_2', ...]

Which would be faster in an application (this is pseudo-code):

for label in label_list:
    run.query("MATCH (n:{label}) return n")

or

run.query("MATCH (n) WHERE (n:label_1 or n:label_2 or ...)")


EDIT:

Actually, I just realized that the best option might be to run multiple NodeByLabelScan in a single query, with something looking like this:

MATCH (a:label_1)
MATCH (b:label_2)
...
UNWIND [a, b ..] as foo
RETURN foo

Could someone speak to it?

1

1 Answers

1
votes

Yes, it would be better to run multiple NodeByLabelScans in a single query.

For example:

OPTIONAL MATCH (a:label_1)
WITH COLLECT(a) AS list
OPTIONAL MATCH (b:label_2)
WITH list + COLLECT(b) AS list
OPTIONAL MATCH (c:label_3)
WITH list + COLLECT(c) AS list
UNWIND list AS n
RETURN DISTINCT n

Notes on the query:

  • It uses OPTIONAL MATCH so that the query can proceed even if a wanted label is not found in the DB.
  • It uses multiple aggregation steps to avoid cartesian products (also see this).
  • And it uses UNWIND so that it can useDISTINCT to return distinct nodes (since a node can have multiple labels).