2
votes

I am using Neo4j community edition embedded in java application for recommendation purpose. I made a custom function which contains a complex logic of comparing two entities, namely product and users. Both entities are present as nodes in graph and has more than 20 properties each for comparison purpose. For eg. I am calling this function in following format:

match (e:User {user_id:"some-id"}) with e
match (f:Product {product_id:"some-id"}) with e,f
return e,f,findComparisonValue(e,f) as pref_value; 

This function call on an average takes about 4-5 ms to run. Now, to recommend best product to a particular user, I wrote a cypher query which iterates over all products, calculate the pref_value and rank them. My cypher query looks like this:

MATCH (source:User) WHERE id(source)={id} with source 
MATCH (reco:Product) WHERE reco.is_active='t'  
with reco, source, findComparisonValue(source, reco) as score_result 
RETURN distinct reco, score_result.score as score, score_result.params as params, score_result.matched_keywords as matched_keywords 
order by score desc

Some insights on graph structure:

Total Number of nodes: 2 million
Total Number of relationships: 20 million
Total Number of Users: 0.2 million
Total Number of Products: 1.8 million

The above cypher query is taking more than 10 seconds as it is iterating over all the products. On top of this cypher query, I am using graphaware-reco module for my recommendation needs (Using precompute, filteing, post processing etc). I thought of parallelising this but community edition does not support clustering. Now, as number of users in system is increasing day by day, I need to think of a scalable solution.

Can anyone help me out here, on how to optimize the query.

1
By your description seems that you are using node properties to calculate the similarity between two nodes. At first glance, I think this is not the right way to go... Probably you should rethink your model and try to use Cypher to delegate to Neo4j engine the hard work.Bruno Peres
Maybe you should use nodes and relationships instead of properties.Bruno Peres
Thanks @BrunoPeres. I need to rethink how to use relationships instead of properties. However most of the properties are real number (for eg. user salary) and therefore don't know how to transform such information into relationships. Any advice?Darshil Babel

1 Answers

0
votes

As others have commented, doing a significant calculation potentially millions of times in a single query is going to be slow, and does not take advantage of neo4j's strengths. You should investigate modifying your data model and calculation so that you can leverage relationships and/or indexes.

In the meantime, there are a number of things to suggest with your second query:

  1. Make sure you have created an index for :Product(is_active), so that it is not necessary to scan all products. (By the way, if that property is actually supposed to be a boolean, then consider making it a boolean rather than a string.)

  2. The RETURN clause should not need the DISTINCT operator, since all the result rows should be distinct anyway. This is because every reco value is already distinct. Removing that keyword should improve performance.