3
votes

I have the following string I'd like to match:

"Ambrosia,Restore Health, , , "

containing unicode whitespace (don't ask me why). /,\s*,/u works just fine in regex101.

But #"(?u),\s*," does not work in clojure:

(re-find #"(?u),\s*," "Ambrosia,Restore Health, , , ") ;nil, should be , ,

Why does this fail?

1
This returns ", ," for me. - Tim Pote
I believe \s matches six ASCII characters and those six ASCII characters only. See the documentation. - glts
@TimPote believe that its a limitation of copy-pasting into the browser, copying it back over works but the original string still fails - Jared Smith
@glts that nailed it, copy pasting the horizontal whitespace grouping got it (clojure doesn't appear to support the \h shorthand). Put it in an answer and I'll gladly accept. - Jared Smith

1 Answers

5
votes

I believe \s matches six ASCII characters and those six ASCII characters only: see the documentation for Pattern.

As you found out already, it may be worth trying some of the other whitespace character classes like \h or \v.

Also, the \p{...} construct can do actual Unicode property matching. White_Space seems the most promising.