3
votes

Our company has a lot of customer data based on surveys. For example we may know that someone likes some sport, tv show, some band, is pregnant and is in some age range. Marketers will be adding and removing criteria to track. Graph databases offer a variety of options for modeling for example we can do something like object modeling

Customer.survey_question1.question = "What tv show do you like"
Customer.survey_question1.answer = "Sesame street"

Here we would give the customer a property with a reference to survey question 1, which would contain the survey properties. Everytime marketers add a question and answer we'd have to update the customer schema.

We could also model it like this

Customer.surveys = [list of references to other objects]

Where surveys is a list of references to survey objects they've answered.

What is the idiomatic way to add a very sparse list of customer properties in a graphdb

2

2 Answers

4
votes

[EDITED]

Here is an idiomatic way to model your use case.

You could use a node for each survey question and give all of those nodes the same label, say SurveyQuestion. For example:

(sq:SurveyQuestion {id: 222, question: "What tv show do you like?"})

Every customer who answers a SurveyQuestion could have a relationship of a specific type (say, ANSWERED) to that question's node, and that relationship could contain the person's answer. For example:

(:Customer {id:123})-[:ANSWERED {answer: "The Voice"}]->(sq)

With this approach, there is no need to update a Customer node whenever you added a new survey question. You would only need to create an ANSWERED relationship whenever a customer actually answers a question.

To get all the survey questions:

MATCH (sq:SurveyQuestion)
RETURN sq;

To get the customers who gave each answer to a question (this is case sensitive, so you may want to lowercase all answers using LOWER before storing them in ANSWERED relationships):

MATCH (sq:SurveyQuestion {id: 222})<-[a:ANSWERED]-(c:Customer)
RETURN sq, a.answer AS answer, COLLECT(c);

To get all the questions a customer answered, and his/her answer to each one:

MATCH (sq:SurveyQuestion)<-[a:ANSWERED]-(c:Customer {id: 123})
RETURN c, a.answer AS answer, sq;
1
votes

From my expirience (about ~1 year with neo4j). The biggest advantege of graph-databases as data storage is generating complex insight from their existing data(where sql databases with join table have weak performance). So storing all the data retrieved from survey in Customer node or (:Customer)-[:ANSWERS]->(:Servey) gives you no benefits from neo4j database. But you get some "dark sides" of neo4j:) I'm not saying that neo4j is bad, but nowadays it's not so polished as sql. Thus to get advantege of neo4j I would try to store every user answer as separate entity if it's meaningful. Creating nodes like :Sport, :TvShow. But age I'd like to store in :Customer as his birth date. Or you might generate Calendar tree, if you plan to use it also in other cases. Thus you can store birth date as relationship to particular node of calendar tree (:Day or :Month or Year e.t.c).

I would use model like (c:Customer)-[r1:ANSWERS]->(s:Servey), (c)-[r2:WATCHES]->(tv:TvShow), (s)-[:SERVEY_REPLY]->(tv). So whenever customer changes his mind and stops watching show s, I delete relationship r1, but I don't lose data as it stored it r2. You can add to this model relationship to :Calendar and a lot of different staff, but be sure you need it).

P.S. As far as I know there are some high paid people to model data bases:) As my advice if you not sure, that you get beneffit from graph-database, than don't use it on production :)