The Neo4j graph database holds roughly 50,000 nodes and > 50,000 relationships. There is a main graph that contains most nodes. But there are several graphs that are not (yet) connected to the main graph.
In order to connect the various graph to form one big main graph I intend to use a Cypher query to list paths or collections of connected nodes ordered by their size (biggest disconnected graph first).
There are several posts on stackoverflow like:
- Finding all disconnected subgraphs in a graph but it's not obvious how to solve it with Cypher.
- How do I find disconnected nodes on neo4j with Cypher?
Here is a small example graph that represents the problem: Neo4j Console example graph
The following Cypher query does not solve the problem but is a starting point. It lists all those nodes that are not connceted to the main graph. It misses the combining of those nodes into collections of nodes. It works on a small graph. On a large graph it only returns "undefined" ... after running more than 10 minutes.
START s=node(3), n=node(*)
MATCH s-[*1..10]-m
WITH collect(m) as members, n
WHERE NOT n in members
RETURN DISTINCT id(n), n.name?
ORDER BY id(n)
LIMIT 10;
How to use Cypher to list all disconnected (sub-) graphs?
Environment: - Neo4j - Graph Database Kernel 1.9.M05 - Java - SE Runtime Environment (build 1.7.0_17-b02)