2
votes

I do not know Neo4j so, bear with me. I have a big (1M nodes) undirected, unweighted graph. Assume I magically import this graph to Neo4j. Can the Neo4j query engine (cypher) support those the following types of queries?

  • Range queries. Bring all me nodes (and their distances) that are within 3 hops distance from a specific node
  • Bring the shortest-path (BFS) distance (since the graph is undirected and unweighted) between a specific node and a set of nodes.
  • Bring the shortest-path (BFS) distance between a specific node and all other graph nodes.

If those types of queries are actually possible, without a recursive type implementation but straight from Cypher, What type of performance should I expect (few seconds, many seconds or minutes)?

1

1 Answers

7
votes

Yes, cypher can do all of those things.

Range queries look something like this:

MATCH path=(a:MyNode { name: "Foo"})-[:myRelationshipType*1..20]->(b));

That gives you all "b" nodes that are between 1 and 20 hops from MyNode with the given relationship type. The matched path is a variable, which can have various cypher functions applied to it. So for each path, you can ask how long it is, what's in the middle, and so on. The cypher refcard shows functions that apply to paths to get a sense for what you can do with them.

Shortest-path searches can be found here and use the shortestPath and allShortestPaths functions in cypher.

If you wanted to get the shortest path from something to basically everything else in the graph, you could do that in one query; the shortest path would start with matching a "head" node, and a "tail" node. In the case of finding shortest path from one thing to everything else, the head node match would be your node of interest, and the tail node match would be "any node in the graph". E.g. MATCH (a:MyNodeOfInterest), (b), p=shortestPath((a)-[*]->(b)). So you could do this in one query, but, if you're trying to find shortest paths from one thing to everything else in a one million node graph, that's going to take some time, no matter what graph database you use.

In terms of performance, nobody can really accurately answer that question. It will depend on a lot of different factors like:

  1. Total data volumes
  2. Indexing strategy
  3. Your use/abuse of node labels, and relationship types
  4. Total path lengths
  5. JVM/memory/cache configurations.