5
votes

I'm using Neo4j 2.1.7 and Node.js to build a REST API. The data - around 70.000 nodes and 100.000 relationships - contains very many small connected subgraphs.

One API call, for example localhost:8000/search?name=Bussum, should return all nodes named Bussum and the connected component they belong to.

Illustration:

Connected components

(Image from Wikipedia)

I can get all the data I need with a query like this:

MATCH (a {name: "Bussum" })-[r*]-(b) 
UNWIND rels AS rel 
RETURN distinct startNode(rel) AS a, type(rel), endNode(rel) AS b

But such a query will just return all triples (a)-[r]-(b) (not grouped per component/subgraph). Of course, I could reconstruct the graph in Node.js and find the subgraphs myself, but this does not at all feel like the best solution. Is it possible to group the returned data in an array/collection of subgraphs/components? Which Cypher queries would match my use case better? Or should I consider using the Neo4j Java API instead?

Thanks! Bert

2
how "small" are the sub graphs? DO the ones in the illustration represent smallest to biggest?Dave Bennett
Some contain one vertex, some 100, but the majority between 5 and 10.Bert
do the nodes in each particular sub graph have a unique identifier to that sub graph?Dave Bennett
Can "Bussum" exist more than once within a single connected component?FrobberOfBits
@DaveBennett yes, each node has an unique identifier, but this identifier is not linked to the node's subgraph.Bert

2 Answers

3
votes

You should still have your original start point as grouping node.

MATCH (root {name: "Bussum" })-[rels*]-(b) 
UNWIND rels AS rel 
RETURN root, 
       collect({start: startNode(rel), 
                 type:      type(rel), 
                  end:   endNode(rel)}) as component
1
votes
MATCH (a {name: "Bossum"})-[*0..]-(b)
WITH DISTINCT a, collect(DISTINCT b) AS sets
RETURN DISTINCT sets

This query will return (possibly) many rows, where each row is a collection of nodes that make a complete subgraph such that each subgraph is as large as possible and contains at least one node named "Bossum". Each row(subgraph) is guaranteed to be unique in the result set.

*I should note, I do not know about the performance of this method.