1
votes

I am new to neo4j. and i really need your help. my case is:

In a city, there are hundreds of thousands of hotels, I use the nodes with label 'Hotel' represent it. Each hotel has property hotel_name, hotel_address, hotel_telephone...

And also there are millions of persons. I use the nodes with label 'Person' to represent person, each person has property person_name, person_identity, person_age.

When a person checked into a hotel, I create a relationship from Person node to Hotel node, the relationship has property in_time with format '20130820134000' (YYYYMMDDHHMISS).

I have enable the auto-index:

node_auto_index with property key: hotel_name, person_name, person_identity

relationship_auto_index with property key: in_time

And there are tens of millions of relationships in my neo4j db.

Now i want to query that what persons checked into hotel between time point '20130910080000' and '20130911080000', both with limit person_age equal to 20.

my cypher is below:

start r = relationship:relationship_auto_index('in_time:[20130910080000 TO 20130911080000]')
match (p:Person)-[r]-(h:Hotel)
where p.person_age=20
return p,r,h

but this cypher above runs very slower, how should i do? Any help will be appreciated.

2

2 Answers

1
votes

This clause right here:

relationship:relationship_auto_index('in_time:[20130910080000 TO 20130911080000]')

Uses a lucene index (the range bit is lucene query syntax). I'm not 100% sure here, but I wonder how selective this makes the query - it's possible that this index check is getting run over every relationship, rather than just the ones that match your path.

So you might want to try this query instead:

match (p:Person)-[r]-(h:Hotel)
where p.person_age=20 AND
      r.in_time >= 20130910080000 AND
      r.in_time <= 20130911080000
return p,r,h

Theoretically this immediately narrows it to just those relationships that are on the right path (not just any relationship), and then hopefully further narrows it by only those connected to people of the right age. Give it a shot.

A dev might want to comment here on how cypher goes about evaluating the different indexes.

You should also attempt the query both ways using the profile keyword to see which way gives you the better execution plan.

0
votes

Try this query:

start r = relationship:relationship_auto_index('in_time:[20130910080000 TO 20130911080000]')
WITH  startNode(r) as p, endNode(r) as h, r
WHERE p:Person and h:Hotel and p.person_age=20
return p,r,h