UTF-8 string size in bytes

Question

I need to determine the length of UTF-8 string in bytes in C. How to do it correctly? As I know, in UTF-8 terminal symbol has 1-byte size. Can I use strlen function for this?

UTF-8 doesn't define how strings are terminated. The use of the null character '\0' to terminate a string is a C convention. — Keith Thompson
The whole point of UTF-8 is that you don't have to change any of your string-processing practices. Only code that interprets the characters of a string potentially needs changing, and even then, usually only if it's applying special interpretation to characters outside of the ASCII range. Things like strlen, strstr, strchr (for searching for single-byte characters), snprintf, etc. just work. — R.. GitHub STOP HELPING ICE

Daniel Fischer Daniel Fischer · Accepted Answer · 2013-05-02T14:34:03

Can I use strlen function for this?

Yes, strlen gives you the number of bytes before the first '\0' character, so

strlen(utf8) + 1

is the number of bytes in utf8 including the 0-terminator, since no character other than '\0' contains a 0 byte in UTF-8.

Of course, that only works if utf8 is actually UTF-8 encoded, otherwise you need to convert it to UTF-8 first.

UTF-8 string size in bytes

2 Answers