Any suggestions to improve the performance by either improving the graph structure or query for the following? Ideally, I'd like to get this to sub 1s. Right now the best I can get it ~8s with 7M db hits over about 2M nodes and 10M rels.
I've got graph structure where
(co:Company)<-[:HAS_PROVIDER]-(c:Customer)-[:HAS_CLAIM]->(c:Claim)
- each claim has a boolean property of "direct" as well as a property called "amount"
- each company has a name
- Customers can have multiple Claims but only one Company
I want to be able to the show:
- distinct company name
- customer count for company
- count of direct=true claims
- sum of amount for direct=true claims
- count of direct=false claims
- sum of amount for direct=false claims
To achieve this, I've used two approaches:
a) created a relationship from Customer to Claim where direct=true as :IS_DIRECT and direct=false as :IS_INDIRECT
b) labeled each direct=true claim as a :DirectClaim node and direct=false claim as :InDirectClaim node
using (a) allows me to get company name, customer count, size of (IS_DIRECT) and size (IS_INDIRECT) by filter on rels TYPE. however, getting the sum(amount) using combinations of extract, filter, reduce times out no matter the configuration.
using (b) works but takes ~10s
EDIT:
query for (a) looks like (with the hattip to @cybersam this is now ~6s)
MATCH (co:Company)<-[:HAS_PROVIDER]-(c)-[r:IS_DIRECT|IS_INDIRECT]->(cl)
WITH distinct co, collect(r) as rels, count (distinct c) as cntc, collect(mc) as claims
WITH co, cntc,
size(filter(r in rels WHERE TYPE(r) = 'IS_DIRECT')) as dcls,
size(filter(r in rels WHERE TYPE(r) = 'IS_INDIRECT')) as indcls,
REDUCE
(s = {dclsamt: 0, indclsamt: 0}, x IN claims |
CASE WHEN x.direct
THEN {dclsamt: s.dclsamt + x.amount, indclsamt: s.indclsamt}
ELSE {dclsamt: s.dclsamt, indclsamt: s.indclsamt + x.amount}
END)
AS data
RETURN co.name as name,cntc, dcls,indcls, data
ORDER BY dcls desc
query for (b) looks like:
MATCH (co:Company)<-[:HAS_PROVIDER]-(c)-[:HAS_CLAIM]->(cl)
WITH distinct co, count (distinct c) as cntc, COLLECT(cl) as cls
WITH co,cntc,
FILTER(n in cls WHERE 'DirectClaim' IN IN LABELS(n)) as dcls,
FILTER(n in cls WHERE 'InDirectClaim' IN LABELS(n)) as indcls
WITH co,cntc, size(dcls) as dclsct, size(indcls) as indclsct,
REDUCE(s = 0 , x IN dcls | s + x.amount) as dclsamt,
REDUCE(s = 0 , x IN indcls | s + x.amount) as indclsamt
RETURN co.name as name, cntc, dclsct, dclsamt, indclsct, indclsamt