0
votes

Following my former question : cypher 2.0 : using label based index to search a set of nodes

I am trying to optimize a query to find users common groups:

MATCH (gr:Group) <--(us:User)-->(gr2:Group)
WHERE gr.name = "gr1" or gr.name = "gr2" 
return distinct gr2 as prop, count(distinct us ) as users
limit 10

Unfortunately, it takes about 4-5 seconds. Ran 'profile' on the query and got the following result:

ColumnFilter(symKeys=["prop", "  INTERNAL_AGGREGATE074636fa-2950-4845-93ae-9d87da36e2b5"], returnItemNames=["prop", "users"], _rows=10, _db_hits=0)
Slice(limit="Literal(10)", _rows=10, _db_hits=0)
  EagerAggregation(keys=["Cached(prop of type Node)"], aggregates=["(  INTERNAL_AGGREGATE074636fa-2950-4845-93ae-9d87da36e2b5,Distinct(Count(us),us))"], _rows=10, _db_hits=0)
    Extract(symKeys=["us", "  UNNAMED20", "gr", "gr2", "r"], exprKeys=["prop"], _rows=34060, _db_hits=0)
      Filter(pred="((Property(pr,name(1)) == Literal(gr1) OR Property(pr,name(1)) == Literal(gr2)) AND hasLabel(gr:Group(2)))", _rows=34060, _db_hits=68120)
        SimplePatternMatcher(g="(us)-['  UNNAMED20']-(gr)", _rows=34060, _db_hits=1653118)
          Filter(pred="hasLabel(gr2:Group(2))", _rows=28188, _db_hits=0)
            TraversalMatcher(start={"label": "User", "producer": "NodeByLabel", "identifiers": ["us"]}, trail="(us)-[r WHERE hasLabel(NodeIdentifier():Group(2)) AND true]-(gr2)", _rows=28188, _db_hits=48828)

gets up to 1653118 db_hits

However the following query takes only 200 ms:

MATCH (gr:Group) <--(us:User)-->(gr2:Group)
WHERE gr.name = "gr1" or gr.name = "gr2" 
return distinct gr2 as prop
limit 10

And, its profile :

Slice(limit="Literal(10)", _rows=10, _db_hits=0)
Distinct(_rows=10, _db_hits=0)
  Extract(symKeys=["us", "  UNNAMED20", "gr", "gr2", "r"], exprKeys=["prop"], _rows=17, _db_hits=0)
    Filter(pred="((Property(gr,name(1)) == Literal(gr1) OR Property(gr,name(1)) == Literal(gr2)) AND hasLabel(gr:Group(2)))", _rows=17, _db_hits=34)
      SimplePatternMatcher(g="(us)-['  UNNAMED20']-(gr)", _rows=17, _db_hits=600)
        Filter(pred="hasLabel(gr2:Group(2))", _rows=10, _db_hits=0)
          TraversalMatcher(start={"label": "User", "producer": "NodeByLabel", "identifiers": ["us"]}, trail="(us)-[r WHERE hasLabel(NodeIdentifier():Group(2)) AND true]-(gr2)", _rows=10, _db_hits=18)

gets only to 600 db_hits at top

Is there any way to change the first query to get the same results in a better performance? I just can't accept a 4 sec per query performance....

1

1 Answers

4
votes

Is

MATCH (gr:Group)
WHERE gr.name = "gr1" or gr.name = "gr2" 
WITH gr
MATCH  (gr)<--(us:User)-->(gr2:Group)
return distinct gr2 as prop, count(distinct us ) as users
limit 10 

any better?