0
votes

Given:

  • Two node labels:
    • 1000 (:A) nodes
    • 1000 (:B) nodes
  • Constraints:
    • CREATE CONSTRAINT ON (a:A) ASSERT a.id IS UNIQUE;
    • CREATE CONSTRAINT ON (b:B) ASSERT b.id IS UNIQUE;
  • One unidirectional relationship type:
    • 4000 [:RELATED_TO] relationships
  • Multiple (a:A)-[:RELATED_TO]->(b:B) paths
    (Meaning, the same node (a:A) could be related to the same node (b:B) multiple times)

I'm trying to run a query that would show the paths of the node that is connected to the biggest number of other unique nodes in the graph. For example, if nodes (a1:A), (a2:A), (a3:A), and (a4:A) are all connected to (b:B) at least once, and it so happens that no other (:B) is connected to any more than three unique (:A) nodes elsewhere in the graph, I would like for the Neo4j Browser to show (b:B) in the center and (a1:A) through (a4:A) around it. I feel like my biggest challenge is that I haven't been able to figure out how to avoid counting up multiple (a1:A)-[:RELATED_TO]->(b:B) paths.

I'll be happy to provide more context if necessary. Thanks in advance!

1

1 Answers

1
votes

This query uses the aggregating function COLLECT (with the DISTINCT operator to qualify its argument) to return the B node that has relationships with the most distinct A nodes, along with those A nodes:

MATCH (a:A)-[:RELATED_TO]->(b:B)
RETURN b, COLLECT(DISTINCT a) AS aNodes
ORDER BY SIZE(aNodes) DESC
LIMIT 1;