20
votes

I'm trying to create a UTF-8 coded file in Qt.

#include <QtCore>

int main()
{
    QString unicodeString = "Some Unicode string";
    QFile fileOut("D:\\Temp\\qt_unicode.txt");
    if (!fileOut.open(QIODevice::WriteOnly | QIODevice::Text))
    {
        return -1;
    }

    QTextStream streamFileOut(&fileOut);
    streamFileOut.setCodec("UTF-8");
    streamFileOut << unicodeString;
    streamFileOut.flush();

    fileOut.close();

    return 0;
}

I thought when QString is by default Unicode and when I set codec of the output stream to UTF-8 that my file will be UTF-8. But it's not, it's ANSI. What do I do wrong? Is something wrong with my strings? Can you correct my code to create UTF-8 file? Next step for me will be to read ANSI file and save it as UTF-8 file, so I'll have to perform a conversion on each read string but now, I want to start with a file. Thank you.

3
You should convert the string literal to a string with QString::fromUtf8(). Also, some compilers have problems with non-ascii encodings in source files (MSVC). So maybe also try if it works when entering the string via e.g. QInputDialog. I also suggest to define QT_NO_CAST_FROM_ASCII and QT_NO_CAST_TO_ASCII when encountering issues like this. It disables implicit conversions and thus makes it clearer what's going on.Frank Osterfeld

3 Answers

17
votes

2022 edit: what follows was true for Qt 4. Qt 5 and later use UTF-8 by default, so this answer doesn’t apply to the latest Qt versions.

Your code is absolutely correct. The only part that looks suspicious to me is this:

QString unicodeString = "Some Unicode string";

The reason it looks suspicious is that QString uses the Latin1 encoding by default when constructing from a C-style string literal, so if you just intend to use accented Latin characters, you're probably fine, but use anything but that (Cyrillic, Chinese, Japanese, Hebrew...) and it no longer works correctly. The best way to deal with this issue is to have your source encoded in UTF-8 and do this instead:

QString unicodeString = QString::fromUtf8("Some Unicode string");

This will work for any imaginable language. Using QObject::trUtf8() is even better as it gives you a lot of i18n capabilities.

Edit

While it's true that you generate a correct UTF-8 file, if you want Notepad to recognize your file as UTF-8, it's a different story. You need to put a BOM in there. It can be done either as suggested in another answer, or here is another way:

streamFileOut.setGenerateByteOrderMark(true);
11
votes

My experience to create txt encoding UTF-8 without BOM by QT as:

file.open(QIODevice::WriteOnly | QIODevice::Text);
QTextStream out(&file);
out.setCodec("UTF-8"); // ...
vcfline = ctn; //assign some utf-8 characters
out.setGenerateByteOrderMark(false);
out << vcfline; //.....
file.close();

And the file will be encoding UTF-8 without BOM.

7
votes

Don't forget that UTF-8 encoding will encode ASCII characters as one byte. Only special or accentuated characters will be encoded with more bytes (from 2 to 6 bytes).

This means as long as you have ASCII characters (which is the case of your unicodeString), the file will only contain 8 bytes characters. Thus, you get backward compatibility with ASCII :

UTF-8 can represent every character in the Unicode character set, but unlike them, possesses the advantages of being backward-compatible with ASCII

To check if your code is working, you should put for instance some accentuated characters in your unicode.

I tested your code with accentuated characters, and it's working fine.

If you want to have a BOM at the beginning of your file, you could start by adding the BOM character (QChar(QChar::ByteOrderMark)).