1
votes

I'm working on a project which aims to build a system to retrieve biomedical information (e.g. biomedical entities such as drugs, diseases and genes, and the relationships between them). When I tried to retrieve the database to find a specific disease using a cypher statement:

For cypher string MATCH (m:Disease) WHERE m.disease_name =~ '(?i)"+disease+"' RETURN m;

if the disease with a name of 2'-benzoyloxycinnamaldehyde or 4-[1-ALLYL-7-(TRIFLUOROMETHYL)-1H-INDAZOL-3-YL]BENZENE-1,3-DIOL, exceptions will occur with messages as follows:

Invalid input '"': expected 0..9, '.', 'e', 'E', an identifier character, whitespace, node labels, '[', "=~", IN, STARTS, ENDS, CONTAINS, IS, '^', '*', '/', '%', '+', '-', '=', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR, LOAD CSV, START, MATCH, UNWIND, MERGE, CREATE, SET, DELETE, REMOVE, FOREACH, WITH, RETURN, UNION, ';' or end of input (line 1, column 53 (offset: 52)) "MATCH (n1:Drug)-[x]-(n2:Disease) RETURN n1 LIMIT 25""

How could I fix this problem? Thank you so much!

1
I would suggest to use parameters MATCH (m:Disease) WHERE m.disease_name =~ {diseaseName} RETURN m $params['diseaseName'] = '(?i)'.$diseaseName; and then pass this param to cypher query because if we use params then the cypher parser doesn't see themiit2011081
this is just on a hunch: you could also try to replace disease with tostring(disease) in you cypher querymanonthemat
Seems like you have an weird quote going down into the query... I'd suggest, by the way, using Cypher parameters if there is any possibility that the "disease" variable might come from an untrusted source. It also can increase query performance. neo4j.com/docs/stable/cypher-parameters.htmlBrian Underwood
Thank you all very much!Amber

1 Answers

1
votes

You actually have 2 different queries, and they have different problems:

  1. The first query is generated by this statement:

    "MATCH (m:Disease) WHERE m.disease_name =~ '(?i)"+disease+"' RETURN m";
    

    Since you are using single-quotes as the regexp string delimiter, if the disease value includes a single-quote, then that would end the regexp too early. You have several choices, listed in increasing quality:

    • Escape (using a preceding backslash) all single-quote characters in disease.
    • Use double-quotes as the regexp string delimiter (which requires escape characters as well, but disease does not need to be changed). I am presuming that disease (which actually seems to be a chemical name) will never include a double-quote:

      "MATCH (m:Disease) WHERE m.disease_name =~ \"(?i)" + disease + "\"RETURN m";
      
    • Pass the disease value as a parameter. No escaping will be necessary, and the query will be faster if you need to make essentially the same query multiple times. However, the passed-in parameter value would have to start with "(?i)".

      "MATCH (m:Disease) WHERE m.disease_name =~ {disease} RETURN m";

  2. Your second query has a simple typo. It has an extra double-quote at the end:

    "MATCH (n1:Drug)-[x]-(n2:Disease) RETURN n1 LIMIT 25""
    

    It should be:

    "MATCH (n1:Drug)-[x]-(n2:Disease) RETURN n1 LIMIT 25"