0
votes

In Neo4j, I have a person node, company node and a position node. So, a "person" has a "position" at a "company". I have a question about modelling the database and the relationships between these nodes.

I could set it up like this (cypher "pseudo code"):

(person)-[:WORKS_AT]-(company) 
(person)-[:HAS_TITLE]->(position) 
(company)-[:HAS_POSITION]->(position)

person has relationship HAS_TITLE and WORKS_AT, or like this:

(person)-[:HAS_TITLE]->(position)<-[:HAS_POSITION]-(company)

where there is no direct relationship between the person and the company, I will have to go through the position node to find the company.

I guess it depends a little bit of what I want to get out of the database, but is there a "correct/standard" way of doing this in a graph database?

Edit:

When searching in the database I want to get all the positions for a person. I want to return them as a collection like this:

    [
      { 
        company: 'some company', 
        position: 'some position at the company' 
      }, 
      { 
        company: 'some other company', 
        position: '' // empty position, I dont have the position data here 
      } ,     
      { 
        company: 'some other company', 
        position: 'some position at the company'
      }      
    ] 

Edit 2:

To clarify a little bit: this is the data structure I get to work with (json). This is data for one person only, there are multiple people who can have the same position in different companies etc. I need to put it into Neo4j and make it searchable:

  {   
     "id": 123,
     "name": "Foo Bar",
     "email": "",
     "workhistory": {
        "positions": [
           {
              "company": "Company A",
              "employees": "",
              "type": "Privately Held",
              "start": 1356998400,
              "end": "",
              "position": "Position A",
              "description": "",
              "industry": "some industry"
           },
           {
              "company": "Company B",
              "employees": "",
              "type": "Privately Held",
              "start": 1356998400,
              "end": "",
              "position": "Position B",
              "description": "",
              "industry": "some industry"
           },
           {
              "company": "Company C",
              "employees": "",
              "type": "Privately Held",
              "start": 1341100800,
              "end": "",
              "position": "Position A",
              "description": "",
              "industry": ""
           }
        ]
     }
  }

I want to be able to search for a "position" node with a "name" property called "Position A", then find all the people with this position and return their profile with the same structure it had when it was put into the database, give or take.

Same goes for "industry" and "company", I want to break those off into separate nodes as well. The database will have millions of people in it.

Edit 3

I'm thinking of maybe adding the position as a node, with a "name" property etc. And then also add the position name on the relationship from person to company. This will make it easy to query for work history, but will take extra space for each WORKED_AT relationship (the WORKED_AT.position property). With this structure I can do searches on a position node as well, and get all the people related to a certain position.

This will probably be ok for this easy structure, but then what about more "nested" values? Like Person lives in City located in State located in Country etc?

1
It does depend a lot on what you want to get out of the database. FWIW, the design you've got here looks completely reasonable to me. Can you elaborate your question a little bit? If you link person and company through position, that looks like it's maybe workable. what are you worried that you won't be able to do with this design?FrobberOfBits
I'm worried that I might end up with a person that I dont have the position data on, so there will be no node connecting the person and the company, even though he did work there, I just dont know what he did there. The reason I need position as a node is that I want to index it with a label and be able to search on it. I could ofc just put the position as a property on the relationship between person and company, but then I wont be able to search for it, without suffering performance.Øyvind
If you always know the company the person worked for, then even if you're missing their position, you can link them via :WORKS_AT. You could also create a dummy position "Unknown" (one per company) and then link that person to that position if you know they work for company X but don't know their title. That way the linkage is always there even if you don't know what the value should be.FrobberOfBits
If you don't always know what company a person works for, then you're probably screwed. If the info isn't in your dataset, then no graph design is going to help, because you just don't know. :)FrobberOfBits
I always know what company a person works for, I think :) So, if there is a position present, create WORKS_FOR, if no position present, do not create WORKS_FOR, just the HAS_TITLE? Edited the question to show what I want to get out out the database.Øyvind

1 Answers

1
votes

In a case like this, I think the "correct" way of doing it is the way that feels most natural for your data and your app. As far as performance is concerned, traversing two relationships instead of one to find the company will not have any added impact unless a position is shared by so many companies that there's intense logic to determine which one is associated with that user.

But the deciding factor should be whether you have data that is unique to the relationship of the person and the company. The first thing I'd think of is a hired_on property to track how long a person has been there. You may want to see that independently of the person's position -- I'd put a started_on property between the person and the position -- but don't you want a person's work history? Things like salary, phone extension, office number... they could all go in properties on that relationship. You wouldn't put them on the HAS_TITLE relationship since you'd lose those if the title changes. They could also be properties on person, but couldn't people work for multiple companies?