Neo4j schema denormalization

Question

In my Neo4j application I have a products, products characteristics and list of votes(with double weight) for each product on a certain characteristic.

In order to sort a list of products by average weight of votes for a list of characteristics I need to perform a complex run-time Cypher query that will calculate a sum of avg weight for list of characteristics for each product.

This is my current Cypher query:

MATCH (parentP:Product)-[:CONTAINS]->(childP:Product)
WHERE id(parentP) = {productId}
WITH childP
OPTIONAL MATCH (p:Product)<-[:VOTED_FOR]-(v:Vote)-[:VOTED_ON]->(c:Characteristic)
WHERE id(p) = id(childP) AND id(c) IN {characteristicIds}
WITH childP, c, avg(v.weight) as weight
RETURN childP AS product, sum(weight) as weight
ORDER BY weight DESC

I know that Neo4j is extremely fast for graph traversal but no so good for aggregation (summing, counting, averaging, and so on). My system can have a large lists of votes for each characteristic on each product.

Please help me to denormilize this structure in order to avoid any performance issues with a big lists of votes.

PROFILE output for Brian's query:

Cypher version: CYPHER 2.2, planner: COST. 374933 total db hits in 1482 ms.

Perhaps you can share a sample database that helps us execute the query? — Michael Hunger
How fast are you wanting the query to run? My first run on your DB took 2.1 seconds and after the first query it's taking about 0.5 second — Brian Underwood
This is a small sample database.. the real database is a much bigger.. I want to make this schema/query independent by performance from Votes list size — alexanoid

Brian Underwood Brian Underwood · Accepted Answer · 2015-08-07T13:05:29

I don't know that I would say that Neo4j isn't good for aggregation. You might be able to help it save some work by specifically including your childP in your second match:

MATCH (parentP)-[:CONTAINS]->(childP:Product)
WHERE id(parentP) = {productId}
OPTIONAL MATCH childP<-[:VOTED_FOR]-(v:Vote)-[:VOTED_ON]->(c)
WHERE id(c) IN {characteristicIds}
WITH childP, c, avg(v.weight) as weight
RETURN childP AS product, sum(weight) as weight
ORDER BY weight DESC

I want to make sure I understand what you're asking for, though. Is this query performing poorly? Are you trying to prevent it from performing poorly in the future?

Neo4j schema denormalization

3 Answers