Following my former question : cypher 2.0 : using label based index to search a set of nodes
I am trying to optimize a query to find users common groups:
MATCH (gr:Group) <--(us:User)-->(gr2:Group)
WHERE gr.name = "gr1" or gr.name = "gr2"
return distinct gr2 as prop, count(distinct us ) as users
limit 10
Unfortunately, it takes about 4-5 seconds. Ran 'profile' on the query and got the following result:
ColumnFilter(symKeys=["prop", " INTERNAL_AGGREGATE074636fa-2950-4845-93ae-9d87da36e2b5"], returnItemNames=["prop", "users"], _rows=10, _db_hits=0)
Slice(limit="Literal(10)", _rows=10, _db_hits=0)
EagerAggregation(keys=["Cached(prop of type Node)"], aggregates=["( INTERNAL_AGGREGATE074636fa-2950-4845-93ae-9d87da36e2b5,Distinct(Count(us),us))"], _rows=10, _db_hits=0)
Extract(symKeys=["us", " UNNAMED20", "gr", "gr2", "r"], exprKeys=["prop"], _rows=34060, _db_hits=0)
Filter(pred="((Property(pr,name(1)) == Literal(gr1) OR Property(pr,name(1)) == Literal(gr2)) AND hasLabel(gr:Group(2)))", _rows=34060, _db_hits=68120)
SimplePatternMatcher(g="(us)-[' UNNAMED20']-(gr)", _rows=34060, _db_hits=1653118)
Filter(pred="hasLabel(gr2:Group(2))", _rows=28188, _db_hits=0)
TraversalMatcher(start={"label": "User", "producer": "NodeByLabel", "identifiers": ["us"]}, trail="(us)-[r WHERE hasLabel(NodeIdentifier():Group(2)) AND true]-(gr2)", _rows=28188, _db_hits=48828)
gets up to 1653118 db_hits
However the following query takes only 200 ms:
MATCH (gr:Group) <--(us:User)-->(gr2:Group)
WHERE gr.name = "gr1" or gr.name = "gr2"
return distinct gr2 as prop
limit 10
And, its profile :
Slice(limit="Literal(10)", _rows=10, _db_hits=0)
Distinct(_rows=10, _db_hits=0)
Extract(symKeys=["us", " UNNAMED20", "gr", "gr2", "r"], exprKeys=["prop"], _rows=17, _db_hits=0)
Filter(pred="((Property(gr,name(1)) == Literal(gr1) OR Property(gr,name(1)) == Literal(gr2)) AND hasLabel(gr:Group(2)))", _rows=17, _db_hits=34)
SimplePatternMatcher(g="(us)-[' UNNAMED20']-(gr)", _rows=17, _db_hits=600)
Filter(pred="hasLabel(gr2:Group(2))", _rows=10, _db_hits=0)
TraversalMatcher(start={"label": "User", "producer": "NodeByLabel", "identifiers": ["us"]}, trail="(us)-[r WHERE hasLabel(NodeIdentifier():Group(2)) AND true]-(gr2)", _rows=10, _db_hits=18)
gets only to 600 db_hits at top
Is there any way to change the first query to get the same results in a better performance? I just can't accept a 4 sec per query performance....