0
votes

We have a large graph (over 1 billion edges) that has multiple relationship types between nodes.
In order to check the number of nodes that have a single unique relationship between nodes (i.e. a single relationship between two nodes per type, which otherwise would not be connected) we are running the following query:

MATCH (n)-[:REL_TYPE]-(m) 
WHERE size((n)-[]-(m))=1 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)

To demonstrate a similar result, the below sample code can run on the movie graph after running
:play movies in an empty graph, resulting with 4 nodes (in this case we are asking for nodes with 3 types of relationships)

MATCH (n)-[]-(m) 
WHERE size((n)-[]-(m))=3 AND id(n)>id(m)
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)

Is there a better/more efficient way to query the graph?

1
If you can give for granted the direction of the relationship on the query it surely helps - Sterconium
Also tipicaly the "distinct" is a performance bottleneck - Sterconium

1 Answers

0
votes

The following query is more performant, since it only scans each relationship once [whereas size((n)--(m)) will cause relationships to be scanned multiple times]. It also specifies a relationship direction to filter out half of the relationship scans, and to avoid the need for comparing native IDs.

MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
RETURN COUNT(DISTINCT n) + COUNT(DISTINCT m)

NOTE: It is not clear what you are using the COUNT(DISTINCT n) + COUNT(DISTINCT m) result for, but be aware that it is possible for some nodes to be counted twice after the addition.

[UPDATE]

If you want to get the actual number of distinct nodes that pass your filter, here is one way to do that:

MATCH (n)-->(m)
WITH n, m, COUNT(*) AS cnt
WHERE cnt = 3
WITH COLLECT(n) + COLLECT(m) AS nodes
UNWIND nodes AS node
RETURN COUNT(DISTINCT node)