
Suppose I have a bunch of User node, which has a property named gender, which can be male or female. Now in order to cluster user based on gender, I have two choice of structure:

1) Add an index to the gender property, and use a WHERE to select users under a gender.

2) Create a Male node and a Female node, and edges linking them to relevant users. Then every time when querying upon gender, I use pattern ,say, (:Male)-[]->(:User).

My question is, which one is better?


1 Answers


Indices should never be a replacement for putting things in the graph.

Indexing is great for looking up unique values and, in some cases, groups of values; however, with the caching that Neo4j can do (and the extensibility of modeling your domain).

Only indexing a property with two (give or take) properties is not the best use of an index and likely won't net too much of a performance boost given the number of results per property value.

That said, going with option #2 can create supernodes, a bottle-necking issue which can become a major headache depending on your model.

Maybe consider using labels (:Male and :Female, for example) as they are essentially "schema indices". Also keep in mind you can use multiple labels per node, so you could have (user:User:Male), etc. It also helps to avoid supernodes while not creating a classic or "legacy" index.