0
votes

Basically, I need to check that a utf-16 string does not contain these characters /:*?<>|+. Apart from them, it can contain any character from English to Latin.

For normal ASCII strings, we would write a RegEx something like ^[^\/:?<>|+]$ How does this expression change for UTF-16 formatted strings?

Can we represent this expression using ascii characters in the RegEx? Or should we have there equivalent unicode code points for matching any characters?

1
What happens if you try? Which language are you using? And do some reading here.Martin Ender
What programming language? Handling of Unicode varies greatly. It should just work in most cases, though.dan1111
I have tried the above expression in JavaScript, and tested using different natural language (en,jp, tw) strings. Seems to pass them (not match) Ok, and blocks (matches) when any of these special characters appear. but was wondering if that's the right way. And I need this expression for JavScript, C++ and XML(xsd validations). The tricky part is the xsd, where there is no way to specify unicode code points(that is \u+002F etc.,), so if the above ascii expression works it will be great. Just wanted to confirm that I am not missing some details on how utf-16 strings should be RegExed.Vijay

1 Answers

1
votes

As all of your special character that you don't want to allow are normal ASCII characters, use regex pattern

/^[^\/:*?<>|+]*$/