1
votes

I am using Neo4j graph database, and I am trying to build a complex cypher query

I have the following nodes and relations:

Nodes:

  • Customer node
  • Brand node
  • Segment node (which has a "MaxReservationsPerBrandPerMonth" property)
  • Reservation node

Relations:

  • Each customer is segmented to a segment
  • customers can have a cust_reserved relation with a reservation
  • Each reservation is related to a brand

I want to get count of customers which are eligible for a new reservation for certain brand.

so, I need to count reservations for each customer for the given brand and compare with MaxReservationsPerBrandPerMonth for this customer's segment.

note that customers who don't have reservations should be counted also

Thanks for your support.

EDIT 1

after trying Michael first query, query is :

PROFILE MATCH (c:Customer)
// also count customers without reservations (make this match optional)
OPTIONAL MATCH (c)-[:CUST_RESERVED]->(r:Reservation)-[:RESERVATION_FOR]->(b:brand {BRAND_ID: "3"})
// count reservations by brand and customers
WITH c, b, count(*) as reservations
MATCH (c)-[:SEGMENTED]->(s:segment)
WHERE reservations < s.max_sms_per_month
// aggregate count customers per brand
RETURN b.NAME, count(distinct c) as customers

Query result: query result

the result is unexpected, I want to get eligible customer for "Honda" brand , the result is only 5 (which were already reserved for Honda before but still eligible because they didn't reach the maximum yet) , the result should be 998734 (all customers)

Query profile: Query profile

Data set: here Needed query in SQL: here

EDIT 2

Michael second query worked like a charm in around 20 sec, thank you Michael I need to include some date logic to check reservations during certain reservations please :)

1
Try to create an constraint / index for :brand(BRAND_ID) (note you should fix the spelling to :Brand(brandId)Michael Hunger
As you group by brand name (which is what I expected as you said "eligible for a brand").Michael Hunger

1 Answers

3
votes

You already spelled it out, please see our online guide for how you'd go from your use-case question: http://neo4j.com/developer/guide-data-modeling/

So your patterns are

(:Customer)-[:IS_SEGMENTED]->(:Segment {MaxReservationsPerBrandPerMonth:int})
(:Customer)-[:HAS_RESERVED]->(:Reservation)
(:Reservation)-[:FOR_BRAND]->(:Brand)

Your question:

  1. I want to get count of customers which are eligible for a new reservation for certain brand.
  2. so, I need to count reservations for each customer for the given brand and compare with MaxReservationsPerBrandPerMonth for this customer's segment.
  3. note that customers who don't have reservations should be counted also
MATCH (c:Customer)
// also count customers without reservations (make this match optional)
OPTIONAL MATCH (c)-[:HAS_RESERVED]->(r:Reservation)-[:FOR_BRAND]->(b:Brand)
// count reservations by brand and customers
WITH c, b, count(*) as reservations
MATCH (c)-[:IS_SEGMENTED]->(s:Segment)
WHERE s.MaxReservationsPerBrandPerMonth < reservations
// aggregate count customers per brand
RETURN b.name, count(distinct c) as customers

"as what should customers be counted without reservations" as 0 ?

Update: Query for non-optional reservations

Question: which brand would customers without a reservation be related to / checkt against? As the reservations-count is 0 then MaxReservationsPerBrandPerMonth would have to be -1 for those to be ever selected? So we can also just make it non-optional? Please also share the query plan you get with a PROFILE prefix.

MATCH (c:Customer)-[:HAS_RESERVED]->(r:Reservation)-[:FOR_BRAND]->(b:Brand)
// count reservations by brand and customers
WITH c, b, count(*) as reservations
MATCH (c)-[:IS_SEGMENTED]->(s:Segment)
WHERE s.MaxReservationsPerBrandPerMonth < reservations
// aggregate count customers per brand
RETURN b.name, count(distinct c) as customers

Update 2, Query Optimization Attempt

Your query is a graph global one, so it will touch many many paths, you have at least 10M db-hits already.

I'd probably change that query then a bit:

I'm not 100% sure anymore though how brand comes into play here: Also a bit hard to do without a dataset!

MATCH (b:brand {BRAND_ID: "3"})
MATCH (c:Customer) 
WITH b,c, size((c)-[:CUST_RESERVED]->()) as total_res
WITH b,c,total_res, 
     case when total_res > 0 then last(nodes(head((c)-[:SEGMENTED]->(:segment)))).max_sms_per_month else 10 end as max_sms_per_month,
     case when total_res > 0 then size((c)-[:CUST_RESERVED]->()-[:RESERVATION_FOR]->(b)) else 0 end as brand_res
WHERE total_res = 0 OR total_res < max_sms_per_month OR brand_res < max_sms_per_month
// TODO what to do with reservations by brand?   
RETURN b.NAME, count(distinct c) as customers