Before going further, here is a representation of my data model. I am stuck for the moment with Neo4J 1.9.2 and have a rather big database (~1 Million Nodes as far as I can tell, maybe less but will be growing over time when all data are ingested). Now that you have it in mind, lets explain what I mean by faceted search.
My items (documentaryUnit) are sometime linked to keywords (which can have different types). What I want to implement is a way to select few keywords and see if there is any node matching the requirements of being connected to keyword1, keyword2, etc.. I don't want to do what faceted search is mainly about, aka. showing number of possibilities and make it unable to query if there is 0 results, matching other possibilities. I just want to be able to do this "simple" query. Keep in mind I am quite new in the Neo4J world, tried to find an answer before but as I am lacking some conceptual things, might have missed the right post.
So, here is the query I tried :
START
facet1 = node:entities("__ID__:keyword-104"),
facet2 = node:entities("__ID__:place-1"),
facet3 = node:entities("__ID__:keyword-2"),
facet4 = node:entities("__ID__:keyword-258")
MATCH
(elem)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(facet1),
(elem)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(facet2),
(elem)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(facet3),
(elem)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(facet4)
WITH distinct elem, facet1, facet2, facet3, facet4, link
RETURN elem
With or without distinct, it takes ages and basically crash sometimes. With only two keywords, it works well ( < 100 ms). 3 is long, 4 crashes (more or less). I need to find a way to do it without using any external services (solr is not an option here for upgrading reasons).
Given the picture I attached, what I want is to find documentaryUnit like #1, attached to keyword 1,4,5,3 through a link. I tried with collection as well, doing so :
START doc = node:entities("__ISA__:documentaryUnit")
MATCH (doc)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(accessPoints)
WITH collect(accessPoints.__ID__) AS accessPointsId, doc
WHERE ALL (x IN ['keyword-104', 'place-1', 'keyword-2']
WHERE x IN accessPointsId)
RETURN doc.__ID__
which does not crash but takes a lot of basenode as a start entry points. Takes between 1000 ms and 2000 ms.
Thank you for reading this, will reply as soon as possible when you post something