0
votes

I am currently modelling a database with over 50.000 nodes and every node has 2 directed relationships. I try to get all nodes for one input node (the root node), which are connected to it with one relationship and all so-called children of these nodes and so on, until every node connected direct and indirect to this root node is reached.

String query =
  "MATCH (m {title:{title},namespaceID:{namespaceID}})-[:categorieLinkTo*..]->(n) " +
  "RETURN DISTINCT n.title AS Title, n.namespaceID " + 
  "ORDER BY n.title";

Result result = db.execute(query, params);
String infos = result.resultAsString();

I have read that the runtime is more likely in O(n^x), but I cannot find any command that excludes for example loops or multiple paths to one node, so the query takes simple over 2 hours and that is not acceptable for my use case.

3
There is no GROUP BY operator in Cypher. Did you mean ORDER BY? - Gabor Szarnyas
Hey, yes my bad. That was just a try if something like that worked here.. and i forget to remove that . It didnt wokred with ORDER BY as well. - DanDo

3 Answers

2
votes

For simple relationship expressions, Cypher excludes multiple relationships automatically by enforcing uniqueness:

While pattern matching, Neo4j makes sure to not include matches where the same graph relationship is found multiple times in a single pattern.

The documentation is not entirely clear on whether this works for variable length paths - so let's design a small experiment to confirm it:

CREATE
  (n1:Node {name: "n1"}),
  (n2:Node {name: "n2"}),
  (n3:Node {name: "n3"}),
  (n4:Node {name: "n4"}),
  (n1)-[:REL]->(n2),
  (n2)-[:REL]->(n3),
  (n3)-[:REL]->(n2),
  (n2)-[:REL]->(n4)

This results in the following graph:

enter image description here

Query with:

MATCH (n:Node {name:"n1"})-[:REL*..]->(m)
RETURN m

The result is:

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n2}│
├──────────┤
│{name: n4}│
├──────────┤
│{name: n4}│
└──────────┘

As you can see n4 is included multiple times (as it can be accessed with avoiding the loop and going through the loop as well). Check the execution with PROFILE:

enter image description here

So we should use DISTINCT to get rid of the duplicates:

MATCH (n:Node {name:"n1"})-[:REL*..]->(m)
RETURN DISTINCT m

The result is:

╒══════════╕
│m         │
╞══════════╡
│{name: n2}│
├──────────┤
│{name: n3}│
├──────────┤
│{name: n4}│
└──────────┘

Again, check the execution with PROFILE:

enter image description here

1
votes

There are certainly some things we can do to improve this query.

For one, you aren't using labels at all. And because you aren't using labels, your match on the input node can't take advantage of any schema indexes you may have in place, and it must do a scan of all 50k nodes accessing and comparing properties until it finds every one with the given title and namespace (it will not stop when it's found one, as it does not know if there are other nodes that satisfy the conditions). You can check the timing by matching on the start node alone.

To improve this, your nodes should be labeled, and your match on your start node should include the label, and your title and namespaceID properties should be indexed.

That alone should provide a noticeable improvement in query speed.

The next question is if the remaining bottleneck is more due to the sorting, or returning a massive result set?

You can check the cost for sorting alone by LIMITing the results you return.

You can use this for the end of the query, after the match.

WITH DISTINCT n
ORDER BY n.title
LIMIT 10
RETURN n.title AS Title, n.namespaceID 
ORDER BY n.title

Also, when doing any performance tuning, you should PROFILE your queries (at least the ones that finish in a reasonable amount of time) and EXPLAIN the ones that are taking an absurd amount of time to examine the query plan.

0
votes
public static HashSet<Node> breadthFirst(String name, int namespace, GraphDatabaseService db) {

        // Hashmap for storing the cypher query and its parameters
        Map<String, Object> params = new HashMap<>();


        // Adding the title and namespaceID as parameters to the Hashmap
        params.put("title", name);
        params.put("namespaceID", namespace);

        /*it is a simple BFS with these variables below
         * basically a Queue (touched) as usual, a Set to store the nodes
         *  which have been used (finished), a return variable and 2 result
         *  variables for the queries
         */
        Node startNode = null;
        String query = "Match (n{title:{title},namespaceID:{namespaceID}})-[:categorieLinkTo]-> (m) RETURN m";
        Queue<Node> touched = new LinkedList<Node>();
        HashSet<Node>finished = new HashSet<Node>();
        HashSet<Node> returnResult = new  HashSet<Node>();

        Result iniResult = null;
        Result tempResult=null;

        /*the part below get the direct nodes and puts them
         * into the queue
         */
            try (Transaction tx = db.beginTx()) {
                 iniResult =db.execute(query,params);


                while(iniResult.hasNext()){
                  Map<String,Object> iniNode=iniResult.next();
                  startNode=(Node) iniNode.get("m");
                  touched.add(startNode);
                  finished.add(startNode);
                }
                tx.success();
                }catch (QueryExecutionException e) {
                logger.error("Fehler bei Ausführung der Anfrage", e);  
                }

            /*and now we just execute the BFS (don't think i need more to
             * say here.. we are all pros ;))
             * as usual, marking every node we have visited
             * and saving every visited node.
             * the difficult part had been the casting from
             * and to node and result, everything else is pretty much
             * straightforward. I think the  variables explain their self
             * via their name....
             */

               while(! (touched.isEmpty())){
                 try (Transaction tx = db.beginTx()) {
                   Node currNode=touched.poll();
                   returnResult.add(currNode);

                   tempResult=null;               
                   Map<String, Object> paramsTemp = new HashMap<>();
                   paramsTemp.put("title",currNode.getProperty("title").toString());
                   paramsTemp.put("namespaceID", 14);
                   String tempQuery = "MATCH (n{title:{title},namespaceID:{namespaceID}})-[:categorieLinkTo] -> (m) RETURN m";
                   tempResult =  db.execute(tempQuery,paramsTemp);



                 while(tempResult.hasNext()){
                     Map<String, Object> currResult= null;
                     currResult=tempResult.next();
                     Node tempCurrNode = (Node) currResult.get("m");

                     if (!finished.contains(tempCurrNode)){
                         touched.add(tempCurrNode);
                         finished.add(tempCurrNode);


                     }


                  }


               tx.success();  
            }catch (QueryExecutionException f) {
                logger.error("Fehler bei Ausführung der Anfrage", f);
            }
        }       


        return returnResult;

}   

Since I was not able to find a fitting Cypher expression, I just wrote a pretty BFS myself - it works it seems.