0
votes

I have a graph database with 150 million nodes and a few hundred million relationships.

There are two types of nodes in the network: account node and transaction node. Each account node has a public key and each transaction node has a number (the amount of total bitcoin involved in this transaction).

There are also two types of relationships in the network. Each relationship connects an account node with a transaction node. One type of relationships is "send" and the other type is "receive". Each relationship also has a number to represent how much bitcoin it sends or receives.

This is an example:

(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = B)
(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = C)

As you can imagine, B or C can also send or receive bitcoins to or from other accounts which involves many different transactions.

What I wants to do is to find all paths with depth equaling to 4 between two accounts, e.g. A and C. I can do this by Cypher although it is slow. It takes about 20mins. My cypher is like this:

start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C") 
match p=src-->(t1)-->(r1)-->(t2)-->dest 
return count(p);

However, when I try to do that using Java API, I got the OutOfMemoryError. Here is my function:

  public ArrayList<Path> getPathsWithConditionsBetweenNodes(String indexName, String sfieldName, String sValue1, String sValue2,
        int depth, final double threshold, String relType){

    ArrayList<Path> res = null;
    if (isIndexExistforNode(indexName)) {
        try (Transaction tx = graphDB.beginTx()) {
            IndexManager index = graphDB.index();
            Index<Node> accounts = index.forNodes(indexName);
            IndexHits<Node> hits = null;
            hits = accounts.get(sfieldName, sValue1);
            Node src = null, dest = null;
            if(hits.iterator().hasNext())
                src = hits.iterator().next();
            hits = null;
            hits = accounts.get(sfieldName, sValue2);
            if(hits.iterator().hasNext())
                dest = hits.iterator().next();
            if(src==null || dest==null){
                System.out.println("Either src or dest node is not avaialble.");
            }



            TraversalDescription td = graphDB.traversalDescription()
                    .depthFirst();

            if (relType.equalsIgnoreCase("send")) {
                td = td.relationships(Rels.Send, Direction.OUTGOING);
                td = td.relationships(Rels.Receive, Direction.OUTGOING);
            } else if (relType.equalsIgnoreCase("receive")) {
                td= td.relationships(Rels.Receive,Direction.INCOMING);
                td = td.relationships(Rels.Send,Direction.INCOMING);
            } else {
                System.out
                        .println("Traverse Without Type Constrain Because Unknown Relationship Type is Provided to The Function.");
            }

            td = td.evaluator(Evaluators.includingDepths(depth, depth))
                    .uniqueness(Uniqueness.RELATIONSHIP_PATH)
                    .evaluator(Evaluators.returnWhereEndNodeIs(dest));




                td = td.evaluator(new Evaluator() {
                    @Override
                    public Evaluation evaluate(Path path) {

                        if (path.length() == 0) {
                            return Evaluation.EXCLUDE_AND_CONTINUE;
                        } else {

                            Node node = path.endNode();
                            if (!node.hasProperty("TBitcoin"))
                                return Evaluation.INCLUDE_AND_CONTINUE;
                            double coin = (double) node.getProperty("TBitcoin");

                            if (threshold!=Double.MIN_VALUE) {
                                if (coin<=threshold) {
                                    return Evaluation.EXCLUDE_AND_PRUNE;
                                } else {
                                    return Evaluation.INCLUDE_AND_CONTINUE;
                                }
                            } else {
                                return Evaluation.INCLUDE_AND_CONTINUE;
                            }
                        }
                    }
                }); 



            res = new ArrayList<Path>();
            int i=0;
            for(Path path : td.traverse(src)){
                i++;
                //System.out.println(path);

                //res.add(path);
            }
            System.out.println();
            tx.success();
        } catch (Exception e) {
            e.printStackTrace();
        }
    } else {
        ;
    }

    return res;
}

Can someone take a look at my function and give me some ideas why it is so slow and will cause out-of-memory error? I set Xmx=15000m while runing this program.

2

2 Answers

0
votes

My $0.02 is that you shouldn't do this with java, you should do it with Cypher. But your query needs some work. Here's your basic query:

start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C") 
match p=src-->(t1)-->(r1)-->(t2)-->dest 
return count(p);

There are at least two problems with this:

  • The intermediate r1 could be the same as your original src, or your original dest (which probably isn't what you want, you're looking for intermediaries)
  • You don't specify that t1 or t2 are send or receive. Meaning that you're forcing cypher to match both kinds of edges. Meaning cypher has to look through a lot more stuff to give you your answer.

Here's how to tighten your query so it should perform much better:

start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C") 
match p=src-[:send]->(t1:transaction)-[:receive]->(r1)-[:send]->(t2:transaction)-[:receive]->dest 
where r1 <> src and 
      r1 <> dest
return count(p);

This should prune out a lot of possible edge and node traversals that you're currently doing, that you don't need to be doing.

0
votes

If I have understood what you are trying to achieve and because you have a direction on your relationship I think that you can get away with something quite simple:

MATCH (src:keys{publickey:'A')-[r:SEND|RECEIVE*4]->(dest:keys{publickey:'C'})
RETURN COUNT(r)

Depending on your data set @FrobberOfBits makes a good point regarding testing equality of intermediaries which you cannot do using this approach, however with just the two transactions you are testing for cases where a Transaction source and destination are the same (r1 <> src and r1 <> dest), which may not even be valid in your model. If you were testing 3 or more transactions then things would get more interesting as you might want to exclude paths like (A)-->(T1)-->(B)-->(T2)-->(A)-->(T3)-->(C)

Shameless theft:

MATCH path=(src:keys{publickey:'A')-[r:SEND|RECEIVE*6]->(dest:keys{publickey:'C'})
WHERE ALL (n IN NODES(path) 
       WHERE (1=LENGTH(FILTER(m IN NODES(path) 
                              WHERE m=n))))
RETURN COUNT(path)

Or traversal (caveat, pseudo code, never used it):

PathExpander expander = PathExapnder.forTypesAndDirections("SEND", OUTGOING, "RECEIVE", OUTGOING)
PathFinder<Path> finder = GraphAlgoFactory.allSimplePaths(expander, 6);
Iterable<Path> paths = finder.findAllPaths(src, dest);