Let's say you have a neo4j graph that has 100,000 "color" nodes and 50,000 "painting" nodes. Each painting node has a "contains" relationship with 50 to 100 of the colors. Lets also say you 200 "aggregate color" nodes each with a relationship with ~ 1000 colors. The aggregate color nodes contain a scalar weight. Finally, you create a "palette" node with a relationship with 10 - 20 aggregate colors.
I want a node4j cypher query that identifies the top 10 painting with the highest weighted sum of aggregated colors based on the colors in the painting.
Let
c represent a color node
a represent a aggregate color node
p represent a painting
l represent a palette
So
(p)-[:contains]->(c)
(a)-[:aggregates]->(c)
(l)-[:uses]->(a)
Supposing I have a palette call "MY_PALETTE", this query will tell me the top 10 paintings in terms of the number of matches of unique aggregated colors.
MATCH (l)-[:uses]->(a)-[:contains]->(c) WHERE l.name = 'MY_PALETTE'
WITH a MATCH (p)-[:contains]->(c), (a)-[:aggregates]->(c)
WITH p, a RETURN p.name, COUNT(DISTINCT a) ORDER BY COUNT(DISTINCT a)
DESC LIMIT 10;
I want the top paintings in terms of the weighted sum. If all the weights were 1, this would give the correct answer.
It seems I can't inspect a in the RETURN clause.
Note that I want to count each aggregated color only once even if the painting contains several colors in the aggregated color.
I want to be able to add new palettes and only have to add relationships between palette and aggregated color.
Any suggestions?