Readers are perhaps looking for an elegant, at least not hackish, solution to this problem. That was my objective as well, but, alas, this is the best I've been able to come up with.
Code
def convert(str)
subs = []
str.gsub(/"[^"]*"| *\| */) do |s|
if s.match?(/ *\| */)
'|'
else
subs << s
'*'
end
end.gsub(/ +/, ' AND ').
gsub(/[*|]/) { |s| s == '|' ? ' OR ' : subs.shift }
end
Examples
puts convert(%Q{(A | B) "C D" (E | "F G" | "H I J") ("K L" ("M N" | "O P")) Q R})
#-> (A OR B) AND "C D" AND (E OR "F G" OR "H I J") AND ("K L" AND ("M N" OR "O P")) AND Q AND R
puts convert(%Q{(A|B) C D (E| "F G" |"H I J") ("K L" ("M N" | "O P")) Q R})
#-> (A OR B) AND C AND D AND (E OR "F G" OR "H I J") AND ("K L" AND ("M N" OR "O P")) AND Q AND R
Notice that in this example there is no space before and/or after some pipes and in some places outside double-quoted strings there are multiple spaces.
puts convert(%Q{(Ant | Bat) Cat Dod (Emu | "Frog Gorilla" | "Hen Ibex Jackel") ("Kwala Lynx" ("Magpie N" | "Ocelot Penguin")) Quail Rabbit})
#-> (Ant OR Bat) AND Cat AND Dod AND (Emu OR "Frog Gorilla" OR "Hen Ibex Jackel") AND ("Kwala Lynx" AND ("Magpie N" OR "Ocelot Penguin")) AND Quail AND Rabbit
Here I've replaced the capital letters with words.
Explanation
To see how this works let
str = %Q{(A | B) "C D" (E | "F G" | "H I J") ("K L" ("M N" | "O P")) Q R}
#=> "(A | B) \"C D\" (E | \"F G\" | \"H I J\") (\"K L\" (\"M N\" | \"O P\")) Q R"
then
subs = []
str.gsub(/"[^"]*"| *\| */) do |s|
if s.match?(/ *\| */)
'|'
else
subs << s
'*'
end
end
#=> "(A|B) * (E|*|*) (* (*|*)) Q R"
subs
#=> ["\"C D\"", "\"F G\"", "\"H I J\"", "\"K L\"", "\"M N\"", "\"O P\""]
As you see, I have removed the spaces around pipes and replaced all quoted strings with asterisks, saving those strings in the array subs
, so that I can later replace the asterisks with their original values. The choice of an asterisk is of course arbitrary.
The regular expression reads, "match a double-quoted string of zero or more characters or a pipe ('|'
) optionally preceded and/or followed by spaces".
As a result of these substitutions, all remaining strings of spaces are to be replaced by ' AND '
:
s2 = s1.gsub(' +', ' AND ')
#=> "(A|B) AND * AND (E|*|*) AND (* AND (*|*)) AND Q AND R"
It remains to replace '|'
with ' OR '
and each asterisk by its original value:
s2.gsub(/[*|]/) { |s| s == '|' ? ' OR ' : subs.shift }
#=> "(A OR B) AND \"C D\" AND (E OR \"F G\" OR \"H I J\") AND (\"K L\" AND (\"M N\" OR \"O P\")) AND Q AND R"