0
votes

I have a string that I need to split by logical operators "and", "or" (case insensitive). However, I should not be considering these logical operator patterns if they appear in quotes, single or double. A sample pattern:

contains(field1,'sample') or contains(field2,'aaandbb') AND (field3 gt 5000)

The output of the split I am trying to achieve:

contains(field1,'sample')

contains(field2,'aaandbb')

(field3 gt 5000)}

Note: Please ignore the brackets.

My code:

String soregex1="\\s+(?i)(and)|(or)\\s+";
    String[] splitStr = so1.split(soregex1);
    for(String str1:splitStr) {
        System.out.println(str1);
    }

All good except when pattern, that is, conditional operators, start appearing as values to string conditions. For example:

contains(field1,'sam or ple') or contains(field2,'aa and bb') AND (field3 gt 5000)

The output for above string with my code is:

contains(field1,'sam

ple')

contains(field2,'aa

bb')

(field3 gt 5000)

instead of

contains(field1,'sam or ple')

contains(field2,'aa and bb')

(field3 gt 5000)

I also need to factor in escaped singleor double quotes. Appreciate any suggestions on how to avoid considering pattern matches that appear in single quotes or double quotes.

2
try a first parsing that removes everything between quotes, then try to match the resulting StringJeremy Grand
Not sure if you are just doing a simple parsing or if your goal is a much more complex scenario and you are just taking it in steps. If you are planning on a more complex "language" I would recommend you use something like javacc or a PEG parser instead and start with a BNF style definition of your language. If your goal is indeed this simple parsing than carry on with REGEX foo.Yepher
have you tried using [^\']+ instead of \\s+?bracco23
@bracco23 yes I have. I get the same set of output as when I try with \\s+ .veebee
@Yepher The input that I am trying to parse is the $filter portion of odata 4.0 query string and it is for client side validation of $filter parameter. I looked at Apache olingo library but I still haven't decided to use it as the number of operators supported by server is very limited (approximately 5) and I only need to support validation for $filter.veebee

2 Answers

0
votes

have you tried this:

(\\)\\s*(AND))|(\\)\\s*(OR))

demo

0
votes

This is a bit wild and crazy, but why not match the tokens themselves instead of splitting on the delimiter. (I'm assuming the (...) is mandatory. Case insensitive match)

\w*\(.*?\)(?=\s*(?:and|or|$))

(demo)