1
votes

I have a neo4j graph with a little more than 100,000 nodes. When I use the following cypher query over REST, I get a Java Heap Error . The query is producing a 2-itemset from a set of purchases .

MATCH (a)<-[:BOUGHT]-(b)-[:BOUGHT]->(c) RETURN a.id,c.id

The cross product of two types of nodes Type 1 (a,c) and Type 2 (b) is of order 80k*20k

Is there a more optimized query for the same purpose ? I am still a newbie to cypher. (I have two indexes on all Type1 and Type2 nodes respectively which I can use) Or should I just go about increasing the java heap size .

I am using py2neo for the REST queries.

Thanks.

1

1 Answers

0
votes

As you said the cross product is 80k * 20k so you probably pull all of them across the wire? Which is probably not what you want. Usually such a query is bound by a start user or a start product.

You might try to run this query in the neo4j-shell:

MATCH (a:Type1)<-[:BOUGHT]-(b)-[:BOUGHT]->(c) RETURN count(*)

If you have a label on the nodes, you can use that label Type1? to drive it. Just to see how many paths you are looking at. But 80k times 20k are 1.6 billion paths.

And I'm not sure if py2neo of the version (which one) you are using is already using streaming for that? Try to use the transactional endpoint with py2neo (i.e. the cypherSession.createTransaction() API).