I have a parser (in Java) for ObjectScript which works quite well, except for one thing: I don't parse "Unicode variable names".
The problem is that the documentation is not very explanative on this subject; and what is more, it misdefines Unicode as "16 bits". This tells me that only characters within the BMP are allowed.
But which ones? The number of Unicode blocks defined in the JDK is frighteningly high, and scripts aren't any better.
I could maybe use Character.isLetter() (note, I elected the version with a char, not an int), but I'm sure that even that would be too large...