6
votes

How can I get the degree of every single node on Neo4j, and then find out which node has the biggest degree in a very huge graph (2 million nodes and 5 million relationships)?

I know I can use Cypher or Gremlin to achieve that, like:

start n = node(*) 
match (n)--(c)
return n, count(*) as connections
order by connections dsec

or

g.V.bothE

but the RAM my computer have is only 2G~4G, I always wait for a long time and get "undefined" when I issue the query above..

does anyone have some experience on query such huge graph on neo4j with gremlin or cypher?

2

2 Answers

2
votes

For the biggest degree, you should also limit the result, so cypher will only have to keep the top 10 results.

START n = node(*) 
MATCH (n)--(c)
RETURN n, count(*) as connections
ORDER BY connections DESC
LIMIT 10

Or you could do:

START n = node(*)
RETURN n, length((n)--(c)) as connections
ORDER BY connections DESC
LIMIT 10

Otherwise I agree with Stefan.

Today you can also use call apoc.stats.degrees('TYPE') where TYPE is the relationship type, you can also pass null or <TYPE or TYPE> with direction. This procedure is implemented in parallel and works well for large graphs.

1
votes

That is in fact a very expensive, global operation. In this case you might be better off using a unmanaged extension that uses GlobalGraphOperations.getAllRelationships. During iterating all relationships you build up a Map and increment the counter for start and end node of each relationship. Final step is to find the maximum within your map.