Out of Memory using Java API but Cypher query can work although it is slow

Question

I have a graph database with 150 million nodes and a few hundred million relationships.

There are two types of nodes in the network: account node and transaction node. Each account node has a public key and each transaction node has a number (the amount of total bitcoin involved in this transaction).

There are also two types of relationships in the network. Each relationship connects an account node with a transaction node. One type of relationships is "send" and the other type is "receive". Each relationship also has a number to represent how much bitcoin it sends or receives.

This is an example:

(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = B)
(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = C)

As you can imagine, B or C can also send or receive bitcoins to or from other accounts which involves many different transactions.

What I wants to do is to find all paths with depth equaling to 4 between two accounts, e.g. A and C. I can do this by Cypher although it is slow. It takes about 20mins. My cypher is like this:

start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C") 
match p=src-->(t1)-->(r1)-->(t2)-->dest 
return count(p);

However, when I try to do that using Java API, I got the OutOfMemoryError. Here is my function:

  public ArrayList<Path> getPathsWithConditionsBetweenNodes(String indexName, String sfieldName, String sValue1, String sValue2,
        int depth, final double threshold, String relType){

    ArrayList<Path> res = null;
    if (isIndexExistforNode(indexName)) {
        try (Transaction tx = graphDB.beginTx()) {
            IndexManager index = graphDB.index();
            Index<Node> accounts = index.forNodes(indexName);
            IndexHits<Node> hits = null;
            hits = accounts.get(sfieldName, sValue1);
            Node src = null, dest = null;
            if(hits.iterator().hasNext())
                src = hits.iterator().next();
            hits = null;
            hits = accounts.get(sfieldName, sValue2);
            if(hits.iterator().hasNext())
                dest = hits.iterator().next();
            if(src==null || dest==null){
                System.out.println("Either src or dest node is not avaialble.");
            }



            TraversalDescription td = graphDB.traversalDescription()
                    .depthFirst();

            if (relType.equalsIgnoreCase("send")) {
                td = td.relationships(Rels.Send, Direction.OUTGOING);
                td = td.relationships(Rels.Receive, Direction.OUTGOING);
            } else if (relType.equalsIgnoreCase("receive")) {
                td= td.relationships(Rels.Receive,Direction.INCOMING);
                td = td.relationships(Rels.Send,Direction.INCOMING);
            } else {
                System.out
                        .println("Traverse Without Type Constrain Because Unknown Relationship Type is Provided to The Function.");
            }

            td = td.evaluator(Evaluators.includingDepths(depth, depth))
                    .uniqueness(Uniqueness.RELATIONSHIP_PATH)
                    .evaluator(Evaluators.returnWhereEndNodeIs(dest));




                td = td.evaluator(new Evaluator() {
                    @Override
                    public Evaluation evaluate(Path path) {

                        if (path.length() == 0) {
                            return Evaluation.EXCLUDE_AND_CONTINUE;
                        } else {

                            Node node = path.endNode();
                            if (!node.hasProperty("TBitcoin"))
                                return Evaluation.INCLUDE_AND_CONTINUE;
                            double coin = (double) node.getProperty("TBitcoin");

                            if (threshold!=Double.MIN_VALUE) {
                                if (coin<=threshold) {
                                    return Evaluation.EXCLUDE_AND_PRUNE;
                                } else {
                                    return Evaluation.INCLUDE_AND_CONTINUE;
                                }
                            } else {
                                return Evaluation.INCLUDE_AND_CONTINUE;
                            }
                        }
                    }
                }); 



            res = new ArrayList<Path>();
            int i=0;
            for(Path path : td.traverse(src)){
                i++;
                //System.out.println(path);

                //res.add(path);
            }
            System.out.println();
            tx.success();
        } catch (Exception e) {
            e.printStackTrace();
        }
    } else {
        ;
    }

    return res;
}

Can someone take a look at my function and give me some ideas why it is so slow and will cause out-of-memory error? I set Xmx=15000m while runing this program.

FrobberOfBits FrobberOfBits · Accepted Answer · 2014-09-11T14:48:26

My $0.02 is that you shouldn't do this with java, you should do it with Cypher. But your query needs some work. Here's your basic query:

start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C") 
match p=src-->(t1)-->(r1)-->(t2)-->dest 
return count(p);

There are at least two problems with this:

The intermediate r1 could be the same as your original src, or your original dest (which probably isn't what you want, you're looking for intermediaries)
You don't specify that t1 or t2 are send or receive. Meaning that you're forcing cypher to match both kinds of edges. Meaning cypher has to look through a lot more stuff to give you your answer.

Here's how to tighten your query so it should perform much better:

start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C") 
match p=src-[:send]->(t1:transaction)-[:receive]->(r1)-[:send]->(t2:transaction)-[:receive]->dest 
where r1 <> src and 
      r1 <> dest
return count(p);

This should prune out a lot of possible edge and node traversals that you're currently doing, that you don't need to be doing.

Out of Memory using Java API but Cypher query can work although it is slow

2 Answers