1
votes

I have a graph in neo4j that looks like this. I want to make it into something like this.

Generalized problem:

How do you traverse children in a certain order (e.g. order by date) and create relationships between the children in that given order?

Specific problem:

Each (:Person) may have multiple (:Diagnosis) nodes, and multiple (:Diagnosis) nodes can share the same (:Concept). The nodes marked as "Condition" are (:Concept) nodes. (:Diagnosis) nodes represent the occurrence of a diagnosis for a person, thus no two people share (:Diagnosis) nodes. However, multiple people can be diagnosed with the same kind of diagnosis, and the type of diagnosis (e.g. Type II Diabetes, aneurysm, etc.) is described by (:Concept) nodes.

I want to create a path of relationships between (:Concept) nodes based on the chronological order of (:Diagnosis) nodes, and I only want to include the first time each (:Concept) is diagnosed.

So far, I've made new relationships between (:Person) and (:Concept) like this:

(:Person {person_id: <some_number>})-[:DIAGNOSED_WITH {start_date: yyyy/mm/dd}]->(:Concept)

I've been testing things out with one (:Person). I did this with the following cypher query:

match (p:Person {person_id: "12345"})--(c:ConditionOccurrence)--(con:Concept) WITH 
    p.person_id as people, con.concept_id as concepts, min(c.condition_start_date) as 
    start_date  ORDER BY  start_date, concepts
MATCH (p1:Person {person_id: people})
MATCH (c2:Concept {concept_id: concepts})
MERGE (p1)-[:DIAGNOSED_WITH {start_date: start_date}]->(c2)

Now I want to create relationships between (:Concept) nodes based on the start date in [:DIAGNOSED_WITH] relationships. It should look something like this:

(concept 1)-[:NEXT {person_id: #}]->(concept 2)-[:NEXT {person_id: #})]->(concept 3)...

I tried using UNWIND on a collection of all [:DIAGNOSED_WITH] relationships for a given (:Person) but I don't think I quite understand how UNWIND works with WITH.

The following query seems to just draw relationships between all (:Concept) nodes where the diagnosis was made on the same start date:

match (p:Person {person_id: "12345"})-[d:DIAGNOSED_WITH]->(c:Concept) WITH 
     p.person_id AS person_id, d AS diagnoses ORDER BY d.start_date
WITH collect(diagnoses) as ordered_diagnoses, person_id as person_id
UNWIND ordered_diagnoses as diagnosis
MATCH (:Person {person_id: person_id})-[diagnosis]->(c1:Concept)
MATCH (:Person {person_id: person_id})-[d2:DIAGNOSED_WITH]->(c2:Concept) WHERE 
     d2.start_date >= diagnosis.start_date AND d2 <> diagnosis
WITH min(d2.start_date) AS min_start_date2, diagnosis, person_id, c1
MATCH (:Person {person_id: person_id})-[:DIAGNOSED_WITH {start_date: 
     min_start_date2}]->(c2:Concept)
MERGE (c1)-[:NEXT {person_id: person_id, start_date1: diagnosis.start_date, 
     start_date2: min_start_date2}]->(c2)

I also tried a "touch" approach where I go through the relationships and touch ones that I have already encountered, but that code isn't working the way I want to either due to my lack of understanding of UNWIND and WITH:

match (p:Person {person_id: "2851389"})-[d:DIAGNOSED_WITH]->(c:Concept) WITH 
     p.person_id AS person_id, d AS diagnoses ORDER BY d.start_date
WITH collect(diagnoses) as ordered_diagnoses, person_id as person_id
UNWIND ordered_diagnoses as diagnosis
MATCH (:Person {person_id: person_id})-[diagnosis]->(c1:Concept)
SET diagnosis.touched = TRUE
WITH person_id, c1, diagnosis
MATCH (:Person {person_id: person_id})-[d2:DIAGNOSED_WITH {touched: FALSE}]->
     (c2:Concept) WHERE d2.start_date >= diagnosis.start_date
SET d2.touched = TRUE
WITH min(d2.start_date) as min_start_date2, person_id, c1, diagnosis
MATCH (:Person {person_id: person_id})-[:DIAGNOSED_WITH {start_date: 
     min_start_date2}]->(c2:Concept)
MERGE (c1)-[:NEXT {person_id: person_id, start_date1: diagnosis.start_date, 
     start_date2: min_start_date2}]->(c2)

Please help! Thanks!

2

2 Answers

1
votes

I decided to stop hacking away in cypher and just did it in python with package py2neo. Much more straightforward. Here's the code in case you're interested:

#!/usr/bin/env python

from py2neo import authenticate, Graph
from py2neo import Node, Relationship

authenticate("localhost:7474", "neo4j", "neo3j")
# default uri for local Neo4j instance
graphdb = Graph('http://localhost:7474/db/data')

def set_NEXT_rels(person_id):
    concepts = graphdb.run("MATCH (p:Person {person_id: \""+person_id+"\"})-[d:DIAGNOSED_WITH]->(c:Concept) RETURN c.concept_id, d.start_date ORDER BY d.start_date, c.concept_name").data()
    for i in range(0, len(concepts)-1):
        d = graphdb.run("MATCH (p:Person {person_id: \""+person_id+"\"})-[d1:DIAGNOSED_WITH {start_date: \""+concepts[i]['d.start_date']+"\"}]->(c1:Concept {concept_id: \""+concepts[i]['c.concept_id']+"\"})  MATCH (p:Person {person_id: \""+person_id+"\"})-[d2:DIAGNOSED_WITH {start_date: \""+concepts[i+1]['d.start_date']+"\"}]->(c2:Concept {concept_id: \""+concepts[i+1]['c.concept_id']+"\"}) MERGE (c1)-[:NEXT {person_id: \""+person_id+"\", start_date_d1: d1.start_date, start_date_d2: d2.start_date}]->(c2)").data()

def process_conditions_by_person():
    people = graphdb.run("MATCH (p:Person) RETURN p.person_id").data()
    for person in people:
        set_NEXT_rels(person['p.person_id'])

def main():
    process_conditions_by_person()

if __name__ == "__main__":
    main()
1
votes

The APOC Procedures library has something to help you out here. Specifically, under the helpers section, Collection Functions subsection, there's a procedure apoc.coll.pairs([list]), which will take in a list and output a list of sublist pairs. The last pair will be the last element in the list paired with null, so we should drop that if our goal is to connect nodes.

Here's an example of usage:

WITH [1, 2, 3, 4, 5] AS stuff
CALL apoc.coll.pairs(stuff) YIELD value
WITH value[0..size(value)-1] AS numbers
RETURN numbers

This will output: [[1, 2], [2, 3], [3, 4], [4, 5]]

So in terms of using this to connect nodes, you would make your query search for the nodes you're interested in, sort them as needed, COLLECT() them into a list, call the pairs() APOC procedure, then use FOREACH to create the relationship between each pair.

EDIT

Some changes to APOC since my answer:

1) apoc.coll.pairs() is a function now, not a procedure (no need to use CALL or YIELD, you can use it inline.

2) apoc.nodes.link() is a procedure that takes a collection of nodes and creates relationships of the given type between them (so you don't have to create the relationships yourself in a foreach) and is generally the preferred way to link up your nodes.