If you want to match letters, whether or not they're accented, unicode property escapes can be helpful.
/\p{Letter}*/u.test("à"); // true
/\p{Letter}/u.test('œ'); // true
/\p{Letter}/u.test('a'); // true
/\p{Letter}/u.test('3'); // false
/\p{Letter}/u.test('a'); // true
Matching to the start of a word is tricky, but (?<=(?:^|\s))
seems to do the trick. The (?<= )
is a positive lookbehind, ensuring that something exists before the main expression. The (?: )
is a non-capture group, so you don't end up with a reference to this part in whatever match you use later. Then the ^
will match the start of the string if the multiline flag isn't set or the start of the line if the multiline flag is set and the \s
will match a whitespace character (space/tab/linebreak).
So using them together, it would look something like:
/(?<=(?:^|\s))\p{Letter}*/u
If you want to only match accented characters to the start of the string, you'd want a negated character set for a-zA-Z.
/(?<=(?:^|\s))[^a-zA-Z]\p{Letter}*/u.match("bœ") // false
/(?<=(?:^|\s))[^a-zA-Z]\p{Letter}*/u.match("œb") // true
// Match characters, accented or not
let regex = /\p{Letter}+$/u;
console.log(regex.test("œb")); // true
console.log(regex.test("bœb")); // true
console.log(regex.test("àbby")); // true
console.log(regex.test("à3")); // false
console.log(regex.test("16 tons")); // true
console.log(regex.test("3 œ")); // true
console.log('-----');
// Match characters to start of line, only match characters
regex = /(?<=(?:^|\s))\p{Letter}+$/u;
console.log(regex.test("œb")); // true
console.log(regex.test("bœb")); // true
console.log(regex.test("àbby")); // true
console.log(regex.test("à3")); // false
console.log('----');
// Match accented character to start of word, only match characters
regex = /(?<=(?:^|\s))[^a-zA-Z]\p{Letter}+$/u;
console.log(regex.test("œb")); // true
console.log(regex.test("bœb")); // false
console.log(regex.test("àbby")); // true
console.log(regex.test("à3")); // false
"à"
certainly matches the pattern/\bà/
. They’re much better for Unicode work. – tchrist