Neo4j: traverse children and create ordered relationship between children based on a property

Question

I have a graph in neo4j that looks like this. I want to make it into something like this.

Generalized problem:

How do you traverse children in a certain order (e.g. order by date) and create relationships between the children in that given order?

Specific problem:

Each (:Person) may have multiple (:Diagnosis) nodes, and multiple (:Diagnosis) nodes can share the same (:Concept). The nodes marked as "Condition" are (:Concept) nodes. (:Diagnosis) nodes represent the occurrence of a diagnosis for a person, thus no two people share (:Diagnosis) nodes. However, multiple people can be diagnosed with the same kind of diagnosis, and the type of diagnosis (e.g. Type II Diabetes, aneurysm, etc.) is described by (:Concept) nodes.

I want to create a path of relationships between (:Concept) nodes based on the chronological order of (:Diagnosis) nodes, and I only want to include the first time each (:Concept) is diagnosed.

So far, I've made new relationships between (:Person) and (:Concept) like this:

(:Person {person_id: <some_number>})-[:DIAGNOSED_WITH {start_date: yyyy/mm/dd}]->(:Concept)

I've been testing things out with one (:Person). I did this with the following cypher query:

match (p:Person {person_id: "12345"})--(c:ConditionOccurrence)--(con:Concept) WITH 
    p.person_id as people, con.concept_id as concepts, min(c.condition_start_date) as 
    start_date  ORDER BY  start_date, concepts
MATCH (p1:Person {person_id: people})
MATCH (c2:Concept {concept_id: concepts})
MERGE (p1)-[:DIAGNOSED_WITH {start_date: start_date}]->(c2)

Now I want to create relationships between (:Concept) nodes based on the start date in [:DIAGNOSED_WITH] relationships. It should look something like this:

(concept 1)-[:NEXT {person_id: #}]->(concept 2)-[:NEXT {person_id: #})]->(concept 3)...

I tried using UNWIND on a collection of all [:DIAGNOSED_WITH] relationships for a given (:Person) but I don't think I quite understand how UNWIND works with WITH.

The following query seems to just draw relationships between all (:Concept) nodes where the diagnosis was made on the same start date:

match (p:Person {person_id: "12345"})-[d:DIAGNOSED_WITH]->(c:Concept) WITH 
     p.person_id AS person_id, d AS diagnoses ORDER BY d.start_date
WITH collect(diagnoses) as ordered_diagnoses, person_id as person_id
UNWIND ordered_diagnoses as diagnosis
MATCH (:Person {person_id: person_id})-[diagnosis]->(c1:Concept)
MATCH (:Person {person_id: person_id})-[d2:DIAGNOSED_WITH]->(c2:Concept) WHERE 
     d2.start_date >= diagnosis.start_date AND d2 <> diagnosis
WITH min(d2.start_date) AS min_start_date2, diagnosis, person_id, c1
MATCH (:Person {person_id: person_id})-[:DIAGNOSED_WITH {start_date: 
     min_start_date2}]->(c2:Concept)
MERGE (c1)-[:NEXT {person_id: person_id, start_date1: diagnosis.start_date, 
     start_date2: min_start_date2}]->(c2)

I also tried a "touch" approach where I go through the relationships and touch ones that I have already encountered, but that code isn't working the way I want to either due to my lack of understanding of UNWIND and WITH:

match (p:Person {person_id: "2851389"})-[d:DIAGNOSED_WITH]->(c:Concept) WITH 
     p.person_id AS person_id, d AS diagnoses ORDER BY d.start_date
WITH collect(diagnoses) as ordered_diagnoses, person_id as person_id
UNWIND ordered_diagnoses as diagnosis
MATCH (:Person {person_id: person_id})-[diagnosis]->(c1:Concept)
SET diagnosis.touched = TRUE
WITH person_id, c1, diagnosis
MATCH (:Person {person_id: person_id})-[d2:DIAGNOSED_WITH {touched: FALSE}]->
     (c2:Concept) WHERE d2.start_date >= diagnosis.start_date
SET d2.touched = TRUE
WITH min(d2.start_date) as min_start_date2, person_id, c1, diagnosis
MATCH (:Person {person_id: person_id})-[:DIAGNOSED_WITH {start_date: 
     min_start_date2}]->(c2:Concept)
MERGE (c1)-[:NEXT {person_id: person_id, start_date1: diagnosis.start_date, 
     start_date2: min_start_date2}]->(c2)

Please help! Thanks!

Michelle Yu Michelle Yu · Accepted Answer · 2016-08-02T19:47:30

I decided to stop hacking away in cypher and just did it in python with package py2neo. Much more straightforward. Here's the code in case you're interested:

#!/usr/bin/env python

from py2neo import authenticate, Graph
from py2neo import Node, Relationship

authenticate("localhost:7474", "neo4j", "neo3j")
# default uri for local Neo4j instance
graphdb = Graph('http://localhost:7474/db/data')

def set_NEXT_rels(person_id):
    concepts = graphdb.run("MATCH (p:Person {person_id: \""+person_id+"\"})-[d:DIAGNOSED_WITH]->(c:Concept) RETURN c.concept_id, d.start_date ORDER BY d.start_date, c.concept_name").data()
    for i in range(0, len(concepts)-1):
        d = graphdb.run("MATCH (p:Person {person_id: \""+person_id+"\"})-[d1:DIAGNOSED_WITH {start_date: \""+concepts[i]['d.start_date']+"\"}]->(c1:Concept {concept_id: \""+concepts[i]['c.concept_id']+"\"})  MATCH (p:Person {person_id: \""+person_id+"\"})-[d2:DIAGNOSED_WITH {start_date: \""+concepts[i+1]['d.start_date']+"\"}]->(c2:Concept {concept_id: \""+concepts[i+1]['c.concept_id']+"\"}) MERGE (c1)-[:NEXT {person_id: \""+person_id+"\", start_date_d1: d1.start_date, start_date_d2: d2.start_date}]->(c2)").data()

def process_conditions_by_person():
    people = graphdb.run("MATCH (p:Person) RETURN p.person_id").data()
    for person in people:
        set_NEXT_rels(person['p.person_id'])

def main():
    process_conditions_by_person()

if __name__ == "__main__":
    main()

Neo4j: traverse children and create ordered relationship between children based on a property

2 Answers