1
votes

I am working on an use-case,in this

  • There are users.
  • There are tags.
  • User shares posts(content) with other users. And each post is connected to multiple tags.

Thinking to create User,Tag and Post nodes.

When a post is shared:

  • relationships are added between Post node and User nodes(with whom post is shared with, can be up to 20 users).
  • a relationships is added between Post node and post creator.
  • relationships are added between Post node and Tag nodes(can be up to 20 tags)

I think adding relationships this way helps to retrieve posts by user or by tags.

Posts are created very frequently.

My concern: I feel that this approach creates lot of data(post node and relationships with tags and other users) per post. Also, think that data will grow very fast as posts are shared and think that each post creation is an expensive operation.

Do you think this approach is fine or is there any better way?

2

2 Answers

2
votes

One issue to be aware of is that when you create a relationship, Neo4j locks nodes on both ends.
http://neo4j.com/docs/stable/transactions-locking.html This means that during a transaction where you create a post and send it to 20 users and 20 tags, the transaction must acquire a write lock on each of those user and tag nodes. Other transactions that try to link nodes to any of those ones will block, causing contention. Since many locks are acquired in a non-deterministic order, this will sometimes cause deadlock, which causes Neo4j to throw an exception and abort the transaction. Since posts are created very frequently, this may cause problems. (I've encountered this exact issue with a similar data model.)

1
votes

Linking together many nodes via meaningful relationships is a good design because it plays to neo4j's strengths. If your application has the requirement to know these relationships, then your design sounds to me like a decent one. You do have other options; for example, if you wanted to know which tags a post had, you could store the tags as an array of strings inside of a node property on the post. This would have the advantage of making it very easy to maintain an individual post's tags. It would have the big disadvantage that if you wanted to query for all posts tagged "cooking" it would be much slower since you'd have to search for any node with a property called "tag" containing a certain value.

Your concerns "creating a lot of data" and "post creation is an expensive operation" are quite general. They might be valid concerns. They might just be worries that you shouldn't be concerned with. I can't tell given the information you've provided.

In general, modeling via nodes and relationships is a good way to go about it, but you need substantial clarity on your application requirements, and your scalability needs over time to provide a really solid answer to the question "is this the right design"?