1
votes

I have Neo4j 3.3.5 graph database: 27GB, 50kk nodes, 500kk relations. Indexes on. Schema. PC: 16GB ram, 4 cores.

Task is to find best matching companies for given query data. Nodes :Company, which I need to get, have multiple relations with nodes:Branch's, :Country's, etc. Query data has BranchIds, CountryIds, etc.

Currently I am using cypher like this to get score from one relations (500k rows in result):

MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in [27444, 1692, 23409, ...] //around 10 ids per query
RETURN 
c.companyId as Id, 
case r.branchType 
 when 0 then 25
 ... // //around 7 conditions per query 
 when 10 then 20 
end as Score

I have to score like this all relations types for :Company, group by Id, sum Score, order and take top 100 results.

Because of lack of post union processing, I am using collect + unwind in order to merge scores from all relations.

Unfortunately, the performance is low. I get response for query of one relations (like above) in 5-10 seconds. When I am trying to combine results with collect + unwind, the query "never" ends.

What is better/proper way to do it? Maybe I am doing something wrong with graph design? Hardware configuration to low? Or maybe there are some algorithms to match with score graph (query data) in graph database?

UPDATE

Query explanation:

User can search in our system for companies. For his query we prepare query data contains ids of branches, countries, words, etc. In result of query we want to get list of best matching companies ids with score.

E.g. user can search for new companies producing wooden tables from Spain.

Combined query example:

MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
WITH case r.branchType 
when "0" then collect({id:c.companyId, score: 25}) 
 when "1" then collect({id:c.companyId, score: 19}) 
 when "2" then collect({id:c.companyId, score: 20}) 
 when "3" then collect({id:c.companyId, score: 19}) 
 when "4" then collect({id:c.companyId, score: 20}) 
 when "5" then collect({id:c.companyId, score: 15}) 
 when "6" then collect({id:c.companyId, score: 6}) 
 when "7" then collect({id:c.companyId, score: 5}) 
 when "8" then collect({id:c.companyId, score: 4}) 
 when "9" then collect({id:c.companyId, score: 4}) 
 when "10" then collect({id:c.companyId, score: 20}) 
end as rows
MATCH (c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
WITH rows + case r.branchType 
when "0" then collect({id:c.companyId, score: 25}) 
 when "1" then collect({id:c.companyId, score: 19}) 
 when "2" then collect({id:c.companyId, score: 20}) 
 when "3" then collect({id:c.companyId, score: 19}) 
 when "10" then collect({id:c.companyId, score: 20}) 
end as rows
MATCH (c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"] 
WITH rows + case r.branchType 
when "0" then collect({id:c.companyId, score: 30}) 
 when "2" then collect({id:c.companyId, score: 15}) 
 end as rows
... //here I would add in future other relations scoring
UNWIND rows AS row
RETURN row.id AS Id, sum(row.score) AS Score
ORDER BY Score DESC
LIMIT 100
2
Can you share with is the explain of your query and the full query ? Moreover it seems that the HAS_BRANCH relationship is important for your business, and generally if it's important, it should be a node.logisima
@logisima thank you for your comment, I have updated questionMike Blermabs

2 Answers

1
votes

You can try this query to see if it's better or not :

MATCH (c:Company) WITH c
OPTIONAL MATCH (c)-[r1:HAS_BRANCH]->(b:Branch) WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
OPTIONAL MATCH (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch) WHERE c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
OPTIONAL MATCH (c)-[r3:HAS_COUNTRY]->(cou:Country) WHERE cou.countryId in ["9580" , "18551" , "15895"] 
WITH c, 
    case r1.branchType 
      when "0" then 25
      when "1" then 19 
      when "2" then 20 
      when "3" then 19 
      when "4" then 20 
      when "5" then 15 
      when "6" then 6 
      when "7" then 5 
      when "8" then 4 
      when "9" then 4 
      when "10" then 20 
    end as branchScore,
    case r2.branchType 
      when "0" then  25 
      when "1" then  19 
      when "2" then  20 
      when "3" then  19 
      when "10" then  20 
    end as revertedBranchScore,
    case r3.branchType 
      when "0" then  30
      when "2" then  15 
    end as countryScore

WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100

Or a better one will be this one (but only if a Company node is mandatory linked to to a Country and Branch) :

MATCH 
  (c:Company)-[r1:HAS_BRANCH]->(b:Branch),
  (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch),
  (c)-[r3:HAS_COUNTRY]->(cou:Country)
WHERE 
  b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND 
  c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
  cou.countryId in ["9580" , "18551" , "15895"]
WITH c, 
    case r1.branchType 
      when "0" then 25
      when "1" then 19 
      when "2" then 20 
      when "3" then 19 
      when "4" then 20 
      when "5" then 15 
      when "6" then 6 
      when "7" then 5 
      when "8" then 4 
      when "9" then 4 
      when "10" then 20 
    end as branchScore,
    case r2.branchType 
      when "0" then  25 
      when "1" then  19 
      when "2" then  20 
      when "3" then  19 
      when "10" then  20 
    end as revertedBranchScore,
    case r3.branchType 
      when "0" then  30
      when "2" then  15 
    end as countryScore

WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
RETURN Id, sum(Score) AS Score
ORDER BY Score DESC
LIMIT 100
0
votes

Let's see if we can keep cardinality down from your matches by using pattern comprehension and the reduce() function to update the score per company as the query progresses, as well as wait until the end to project out the id property:

MATCH (c:Company)
WITH c, [(c)-[r:HAS_BRANCH]->(b:Branch) 
 WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as hasBranchTypes
WITH c, reduce(runningScore = 0, type in hasBranchTypes | runningScore + 
 case type 
 when "0" then 25
 when "1" then 19
 when "2" then 20 
 when "3" then 19 
 when "4" then 20 
 when "5" then 15 
 when "6" then 6 
 when "7" then 5 
 when "8" then 4 
 when "9" then 4 
 when "10" then 20 
 end ) as score

WITH c, score, [(c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
 WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] | r.branchType] as revertedBranchTypes
WITH c, reduce(runningScore = score, type in revertedBranchTypes | runningScore + 
 case type
 when "0" then 25
 when "1" then 19 
 when "2" then 20 
 when "3" then 19 
 when "10" then 20 
end ) as score

WITH c, score, [(c:Company)-[r:HAS_COUNTRY]->(cou:Country)
 WHERE cou.countryId in ["9580" , "18551" , "15895"] | r.branchType] as hasCountryTypes
WITH c, reduce(runningScore = score, type in hasCountryTypes | runningScore + 
 case type
 when "0" then 30 
 when "2" then 15 
 end ) as score
 //here I would add in future other relations scoring

WITH c, score
ORDER BY score DESC
LIMIT 100
RETURN c.id as Id, score as Score