2
votes

In my Neo4j application I have a products, products characteristics and list of votes(with double weight) for each product on a certain characteristic.

In order to sort a list of products by average weight of votes for a list of characteristics I need to perform a complex run-time Cypher query that will calculate a sum of avg weight for list of characteristics for each product.

This is my current Cypher query:

MATCH (parentP:Product)-[:CONTAINS]->(childP:Product)
WHERE id(parentP) = {productId}
WITH childP
OPTIONAL MATCH (p:Product)<-[:VOTED_FOR]-(v:Vote)-[:VOTED_ON]->(c:Characteristic)
WHERE id(p) = id(childP) AND id(c) IN {characteristicIds}
WITH childP, c, avg(v.weight) as weight
RETURN childP AS product, sum(weight) as weight
ORDER BY weight DESC

I know that Neo4j is extremely fast for graph traversal but no so good for aggregation (summing, counting, averaging, and so on). My system can have a large lists of votes for each characteristic on each product.

Please help me to denormilize this structure in order to avoid any performance issues with a big lists of votes.

PROFILE output for Brian's query:

Cypher version: CYPHER 2.2, planner: COST. 374933 total db hits in 1482 ms.

PROFILE output

3
Perhaps you can share a sample database that helps us execute the query? - Michael Hunger
I have added sample database and query - alexanoid
How fast are you wanting the query to run? My first run on your DB took 2.1 seconds and after the first query it's taking about 0.5 second - Brian Underwood
This is a small sample database.. the real database is a much bigger.. I want to make this schema/query independent by performance from Votes list size - alexanoid

3 Answers

2
votes

I don't know that I would say that Neo4j isn't good for aggregation. You might be able to help it save some work by specifically including your childP in your second match:

MATCH (parentP)-[:CONTAINS]->(childP:Product)
WHERE id(parentP) = {productId}
OPTIONAL MATCH childP<-[:VOTED_FOR]-(v:Vote)-[:VOTED_ON]->(c)
WHERE id(c) IN {characteristicIds}
WITH childP, c, avg(v.weight) as weight
RETURN childP AS product, sum(weight) as weight
ORDER BY weight DESC

I want to make sure I understand what you're asking for, though. Is this query performing poorly? Are you trying to prevent it from performing poorly in the future?

0
votes

If we are speaking about performance problems, it would be useful to attach profiling information: http://neo4j.com/docs/stable/how-do-i-profile-a-query.html

Then will be easier to decide where to act.

0
votes

You can also try:

MATCH (c:Characteristic) WHERE id(c) IN {characteristicIds} 
WITH collect(c) as characteristics

MATCH (parentP:Product)-[:CONTAINS]->(childP:Product)
WHERE id(parentP) = {productId}
OPTIONAL MATCH childP<-[:VOTED_FOR]-(v:Vote)
WITH childP, v, head(filter(c in characteristics WHERE (v)-[:VOTED_ON]->(c))) as c
WHERE c is not null
WITH childP, c, avg(v.weight) as weight
RETURN childP AS product, sum(weight) as weight
ORDER BY weight DESC

As there are much fewer vote on a product than votes per characteristic, we start with those and then check against characteristics. Without a db it is hard to say though.