I have the following cypher query in Neo4j:
MATCH (n0:Company)
WHERE n0.name = 'Google'
WITH n0
MATCH (n0) <- [r0:WORKED_AT] - (person:Person)
WITH DISTINCT ID(person) as id,
((CASE WHEN r0.count IS NOT NULL THEN toFloat(r0.count) ELSE 1.00 END) *
(CASE WHEN r0.source = 'someSource' THEN
CASE TYPE(r0)
WHEN 'WORKED_AT' THEN 0.7
WHEN 'HAS_SKILL' THEN 0.3
WHEN 'HAS_INTEREST' THEN 0.6
WHEN 'LIVED_IN_COUNTRY' THEN 0.8
ELSE 0.5 END
ELSE 0.5 END)) as score
WITH id, score
ORDER BY score DESC
LIMIT 20
RETURN COLLECT({id: id, score: score}) as result
I'm trying to sort the result based on the type of the relationship and the .source property on the relationship. This query is built up based on user input, so in the first part of the query it could be any type of node or relationship based on user input, The label Company could be switched out with Skill, WORKED_AT could be switched out with HAS_SKILL etc.
I want to be able to control the factor of which the relationship.count value is multiplied by, based on relationship.source and type(relationship).
This works fine, my concern is when the result is becoming large, this will probably be a very slow query cause it needs to do calculations for every row and after all calculations are done, sort the results based on the calculation. Is there any way I can optimize this query so it will work with ANY size of results?
My end goal with the query is to find all the persons matching a criteria, sort the result based on my calculations in the cypher query, then limit the result to get the top 20 results for a given user query.
Edit:
MATCH (n0:Company)
WHERE n0.name = 'Google'
WITH n0
MATCH (n1:Skill)
WHERE n1.name = 'java'
WITH n1, n0
MATCH (n0) <- [r0:WORKED_AT] - (person:Person)
, (person) - [r1:HAS_SKILL] -> (n1)
WITH DISTINCT ID(person) as id,
((CASE WHEN r0.count IS NOT NULL THEN toFloat(r0.count) ELSE 1.00 END) *
(CASE WHEN r0.source = 'unreliable' THEN CASE TYPE(r0)
WHEN 'WORKED_AT' THEN 0.2
WHEN 'HAS_SKILL' THEN 0.3
WHEN 'HAS_INTEREST' THEN 0.1
WHEN 'LIVED_IN_COUNTRY' THEN 0.3
WHEN 'STUDIED_AT' THEN 0.2
ELSE 0.3 END
ELSE 0.3 END))
+
((CASE WHEN r1.count IS NOT NULL THEN toFloat(r1.count) ELSE 1.00 END) *
(CASE WHEN r1.source = 'reliable' THEN CASE TYPE(r1)
WHEN 'WORKED_AT' THEN 0.8
WHEN 'HAS_SKILL' THEN 0.7
WHEN 'HAS_INTEREST' THEN 0.6
WHEN 'LIVED_IN_COUNTRY' THEN 0.5
WHEN 'STUDIED_AT' THEN 0.8
ELSE 0.7 END
ELSE 0.7 END)) as score
ORDER BY score DESC
LIMIT 20
RETURN COLLECT({id: id, score: score}) as result
I removed the extra WITH statement. I have also updated my query with a bit more complex one, I was trying to keep it simple in my original question, but I think I might have left out some important information about the query.
In this query, I can have multiple nodes I need to match against, so there can be multiple relationships (r0, r1...rn) etc, up to a reasonable amount ofc. Adding more nodes to match against will narrow the number of persons returned, which is good. But it also adds more CASE WHEN statements. My modifiers are currently set up like this:
var modifiers = {
reliable: {
HAS_SKILL: 0.8,
WORKED_AT: 0.7,
HAS_INTEREST: 0.7,
LIVED_IN_COUNTRY: 0.6,
default: 0.8
},
unreliable: {
HAS_SKILL: 0.2,
WORKED_AT: 0.4,
HAS_INTEREST: 0.1,
LIVED_IN_COUNTRY: 0.3,
default: 0.3
},
...
default: 0.5
};
I tried passing them in as parameters, but I don't think Neo4j supports "nested" parameters like this. I need to have different modifier based on if the source is reliable or not. So a Skill from a reliable source will get a higher modifier than an unreliable source etc. If I somehow can remove the CASE statements, that will be very nice. The suggestion from JohnMark13 about hyperEdges sounds interesting, will look into that.
[r0:WORKED_AT]
and[r1:HAS_SKILL]
so the case statements and all other modifiers are superfluous as they will never match/be used. – JohnMark13