This is more actually more of a Lucene question, but it's in the context of a neo4j database.
I have a database that's divided into 50 or so node types (so "collections" or "tables" in other types of dbs). Each has a subset of properties that need to be indexed, some share the same name, some don't.
When searching, I always want to find nodes of a specific type, never across all nodes.
I can see three ways of organizing this:
One index per type, properties map naturally to index fields: index 'foo',
'id'='1234'
.A single global index, each field maps to a property name, to distinguish the type either include it as part of the value (
'id'='foo:1234'
) or check the nodes once they're returned (I expect duplicates to be very rare).A single index, type is part of the field name:
'foo.id'='1234'
.
Once created, the database is read-only.
Are there any benefits to one of those, in terms of convenience, size/cache efficiency, or performance?
As I understand it, for the first option neo4j will create a separate physical index for each type, which seems suboptimal. For the third, I end up with most lucene docs only having a small subset of the fields, not sure if that affects anything.