Unicode specifies that \X
should match an "extened grapheme cluster" - for instance a base character followed by zero or more combining characters. (I believe this is a simplification but may suffice for my needs.)
I'm pretty sure at least Perl supports \X
in its regular expresions.
But Vim defines \X
to match a non-hexadecimal digit.
Does Vim have any equivalent to \X
or any way to match a Unicode extended grapheme cluster?
Vim does have a concept of combining or "composing" characters, but its documentation does not cover whether or how they are supported in regular expressions.
It seems that Vim does not yet support this directly, but I am still interested in a workaround where a search will highlight all characters which include a combining character in at least the most basic range of U+0300
to U+0364
.
J̌
(004a 030c
). But more generally I just want to know whether Vim has or plans to have support for this, as it's becoming more and more common that us programmers have to deal with such things. – hippietrail/\%u004a\%u030c\Z
. You'll have to come up with a seriously big pattern if you want to highlight every possible combinations. The upside is that it will probably be portable to JS with "minimal" effort. Ho, and Kyle's answer is very informative. – romainl\%u030c
, but when I try to extend the pattern from justCOMBINING CARON
to the entireCombining Diacritical Marks
range by using[\u0300-\u0364]
nothing is matched any longer! – hippietrail