Spring Data Neo4j creates duplicate nodes

Question

I am using Spring Data Neo4j 3.3.1.RELEASE with Neo4j server 2.2.3.

My problem is there are some nodes which are duplicates of my entities but has only the indexed property in it.

My class looks something like this

@NodeEntity
@TypeAlias("Product")
public class Product {
    @GraphId
    private Long graphId;
    @Indexed(indexName="productId", unique=true, indexType=IndexType.SIMPLE)
    private String productId;
    private String productType;
    ...
}

When a new node is created, I first check if there is an existing one and update it if it exists, otherwise create a new one.

Product product = productRepository.findByProductId(productId);
if (product == null) {
    product = new Product(productId);
}
...
productRepository.save(product);

The repository interface.

public interface ProductRepository extends GraphRepository<Product> {
    public Product findByProductId(String productId);
}

In Neo4j, the entity is created to a node with all the properties. But some nodes also have a duplicate node which only contain the productId. The thing is this doesn't happen to all the nodes. As of now we have about 120,000 nodes and as much as 30 nodes have this duplicate. Every time we re-ingest the data there are duplicates. Right now we only have 2 duplicate nodes.

One more thing, upon checking the duplicate nodes, it seems that they have a node ID in sequence which I think that they are created together when I save the entity.

EDIT: Upon investigation, it seems that the unique constraint is not applied to the productId. The problem seems to be from @Indexed annotation. If I used unique and indexName in the same annotation, only the indexName is applied and not the constraint. Now if I use either indexName or unique SDN can create one of it and I must create the other via the Neo4j webconsole which is kind of annoying. I know that in SDN 4.x.x index maintenance will not be part of the code and should be handled externally. Is this something that we need to do now since SDN 3.3.x does not handle it correctly?

cybersam cybersam · Accepted Answer · 2015-08-14T17:35:44

indexName and indexType are only used to define legacy indexes (which are now deprecated), and unique is only used to define the uniqueness constraint for schema indexes. The two index types are mutually exclusive.

If you want to impose a uniqueness constraint, you need to just use unique.

Spring Data Neo4j creates duplicate nodes

1 Answers