On Linux, I have a directory with lots of files. Some of them have non-ASCII characters, but they are all valid UTF-8. One program has a bug that prevents it working with non-ASCII filenames, and I have to find out how many are affected. I was going to do this with find
and then do a grep to print the non-ASCII characters, and then do a wc -l
to find the number. It doesn't have to be grep; I can use any standard Unix regular expression, like Perl, sed, AWK, etc.
However, is there a regular expression for 'any character that's not an ASCII character'?
/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F]
– Tinmarino