1
votes

In genealogy we use DNA to find matches. Y-DNA finds patrilineal matches. A neo4j query (where RN is a unique identifier for a person) that does this is:

MATCH (n{RN:1}) match p=n-[r:father*..22]->m return m.RN as RN,m.fullname as FullName,m.sex as Sex,m.bd as BD,m.dd as DD,length(p) as generation,case when left(m.bd,4)>'1930' and rtrim(m.dd)='' then 'Y' else 'N' end as mtDNA_Candidate, reduce(srt2 ='|', q IN nodes(p)| srt2 + q.RN + '|') AS PathOrder order by generation desc,PathOrder desc

or we use mitochondrial DNA for matrilineal matches:

`MATCH (n{RN:1}) match p=n-[r:mother*..22]->m return m.RN as RN,m.fullname as FullName,m.sex as Sex,m.bd as BD,m.dd as DD,length(p) as generation,case when left(m.bd,4)>'1930' and rtrim(m.dd)='' then 'Y' else 'N' end as mtDNA_Candidate, reduce(srt2 ='|', q IN nodes(p)| srt2 + q.RN + '|') AS PathOrder order by generation desc,PathOrder desc`

MY QUESTION is related to X-chromosome DNA. A father gives an X-chromosome to only his daughters and a mother gives one to all her children. Thus, I need a cypher query that gets all mother's but only father's when there is a daughter in the immediate more recent temporal generation. If there is a son in the later generation then I exclude the father. I have a property 'sex' in the nodes with a value of either M or F. the birth date is not always known, so it cannot be used to determine directionality

I tried this, but get an error:

`MATCH (n{RN:1}) match p=n-[r:mother*..22|father*..1]->m return m.RN as RN,m.fullname as FullName,m.sex as Sex,m.bd as BD,m.dd as DD,length(p) as generation,case when left(m.bd,4)>'1930' and rtrim(m.dd)='' then 'Y' else 'N' end as mtDNA_Candidate, reduce(srt2 ='|', q IN nodes(p)| srt2 + q.RN + '|') AS PathOrder order by generation desc,PathOrder desc`
1
What is the error message?William Lyon
Could you describe your data model more detailed? What exactly are you looking for in terms of you graph?Martin Preusse
The error message with the 3rd query is Invalid input '|': expected an identifier character, a property map or ']' (line 1, column 41 (offset: 40)) "MATCH (n{RN:1}) match p=n-[r:mother*..22|father*..1]->m return m.RN as RN,m.fullname as FullName,m.sex as Sex,m.bd as BD,m.dd as DD,length(p) as generation,case when left(m.bd,4)>'1930' and rtrim(m.dd)='' then 'Y' else 'N' end as mtDNA_Candidate, reduce(srt2 ='|', q IN nodes(p)| srt2 + q.RN + '|') AS PathOrder order by generation desc,PathOrder desc" ^David A Stumpf
More on the data model. The Person nodes have relationships of father and mother pointing to the nodes for their biological parents. Each node has properties such as sex, BD (birth date), DD. There are another set of nodes for unions which are not used in the query but have a union_id and an identified for each partner (e.g. parent) with U1 being a husband and U2 a wife. Every Person node has a relationship to one union (e.g, their parents).David A Stumpf

1 Answers

1
votes

[UPDATED]

The [r:mother*..22|father*..1] syntax is illegal. A relationship in a Cypher query can have at most a single variable length specification, and it must come after the relationship type(s). (Aside: note also that [:father*..1] is the same as [:father]).

Does this query, which seems to be logically equivalent, work for you?

MATCH pf=(n { RN:1 })-[:father]->()
MATCH pm=n-[:mother*..22]->()
WITH [pf] + COLLECT(pm) AS paths
UNWIND paths AS p
WITH LENGTH(p) AS generation, NODES(p) AS ancestors
WITH generation, ancestors, LAST(ancestors) AS m
RETURN m.RN AS RN, m.fullname AS FullName, m.sex AS Sex, m.bd AS BD, m.dd AS DD, generation,
  CASE WHEN left(m.bd,4)>'1930' AND rtrim(m.dd)='' THEN 'Y' ELSE 'N' END AS mtDNA_Candidate,
  reduce(srt2 ='|', q IN ancestors | srt2 + q.RN + '|' ) AS PathOrder
ORDER BY generation DESC, PathOrder DESC;