The following query, takes between 1.5sec to 9sec, depends on {keywords}
match (pr:Property)
WHERE (pr.name in {keywords})
with pr
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 10
Each Item
is connected to 100 Properties
.
sample profile on the server:
neo4j-sh (?)$ profile match (pr:Property)
WHERE (pr.name in ["GREEN","SHORT","PLAIN","SHORT-SLEEVE"])
with pr
MaTCH (pr) <--(it:Item)
MaTCH (it)-->(pr2)<-[:CAT]-(ca)
return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType
limit 40;
+------------------------------------------------------------------------------------------40 rows
ColumnFilter(symKeys=["prop", "rType", " INTERNAL_AGGREGATE58d28d0e-5727-4850-81ef-7298d63d7be8"], returnItemNames=["prop", "sum", "rType"], _rows=40, _db_hits=0)
Slice(limit="Literal(40)", _rows=40, _db_hits=0)
EagerAggregation(keys=["Cached(prop of type Node)", "Cached(rType of type Any)"], aggregates=["( INTERNAL_AGGREGATE58d28d0e-5727-4850-81ef-7298d63d7be8,Distinct(Count(it),it))"], _rows=40, _db_hits=0)
Extract(symKeys=["it", "ca", " UNNAMED122", "pr", "pr2", " UNNAMED130", " UNNAMED99"], exprKeys=["prop", "rType"], _rows=645685, _db_hits=645685)
SimplePatternMatcher(g="(it)-[' UNNAMED122']-(pr2),(ca)-[' UNNAMED130']-(pr2)", _rows=645685, _db_hits=0)
Filter(pred="hasLabel(it:Item(0))", _rows=6258, _db_hits=0)
SimplePatternMatcher(g="(it)-[' UNNAMED99']-(pr)", _rows=6258, _db_hits=0)
Filter(pred="any(-_-INNER-_- in Collection(List(Literal(GREEN), Literal(SHORT), Literal(PLAIN), Literal(SHORT-SLEEVE))) where Property(pr,name(1)) == -_-INNER-_-)", _rows=4, _db_hits=1210)
NodeByLabel(identifier="pr", _db_hits=0, _rows=304, label="Property", identifiers=["pr"], producer="NodeByLabel")
neo4j version : 2.0.1
Heap size : 3.2 GB max (not even close to get to it..)
DataBase disk usage : 270MB
NumOfNodes : 4368
NumOf Relationships : 395693
Computer : AWS EC2 c3.large . But, tried to run it on a 4 times faster computer and the results were the same..
When looking at the JConsole I can see that the heap goes from 50mb to 70mb and then cleaned by GC.
Anyway to make it faster? This performance is way too slow for me...
EDIT: As suggested I tried combining the matches, but it is slower as you can see in the profile:
neo4j-sh (?)$ profile match (pr:Property) WHERE (pr.name in ["GREEN","SHORT","PLAIN","SHORT-SLEEVE"]) with pr MaTCH (pr) <--(it:Item)-->(pr2)<-[:CAT]-(ca) return distinct pr2 as prop,count(distinct it) as sum , ca.name as rType limit 40;
ColumnFilter(symKeys=["prop", "rType", " INTERNAL_AGGREGATEa6eaa53b-5cf4-4823-9e4d-0d1d66120d51"], returnItemNames=["prop", "sum", "rType"], _rows=40, _db_hits=0)
Slice(limit="Literal(40)", _rows=40, _db_hits=0)
EagerAggregation(keys=["Cached(prop of type Node)", "Cached(rType of type Any)"], aggregates=["( INTERNAL_AGGREGATEa6eaa53b-5cf4-4823-9e4d-0d1d66120d51,Distinct(Count(it),it))"], _rows=40, _db_hits=0)
Extract(symKeys=[" UNNAMED111", "it", "ca", " UNNAMED119", "pr", "pr2", " UNNAMED99"], exprKeys=["prop", "rType"], _rows=639427, _db_hits=639427)
Filter(pred="(hasLabel(it:Item(0)) AND hasLabel(it:Item(0)))", _rows=639427, _db_hits=0)
SimplePatternMatcher(g="(ca)-[' UNNAMED119']-(pr2),(it)-[' UNNAMED99']-(pr),(it)-[' UNNAMED111']-(pr2)", _rows=639427, _db_hits=0)
Filter(pred="any(-_-INNER-_- in Collection(List(Literal(GREEN), Literal(SHORT), Literal(PLAIN), Literal(SHORT-SLEEVE))) where Property(pr,name(1)) == -_-INNER-_-)", _rows=4, _db_hits=1210)
NodeByLabel(identifier="pr", _db_hits=0, _rows=304, label="Property", identifiers=["pr"], producer="NodeByLabel")