3
votes

I'm working on a cross-platform project using Qt. On Windows, I want to pass some Unicode characters (for instance, file path that contains Chinese characters) as arguments when launching the application from the command line. Then use these arguments to create a QCoreApplication.

For some reasons, I need to use CommandLineToArgvW to get the argument list like this:

LPWSTR * argvW = CommandLineToArgvW( GetCommandLineW(), &argc );

I understand on modern Windows OS, LPWSTR is actually wchar_t* which is 16bit and uses UTF-16 encoding.

While if I want to initialize the QCoreApplication, it only takes char* but not wchar_t*. QCoreApplication

So the question is: how can I safely convert the LPWSTR returned by CommandLineToArgvW() function to char* without losing the UNICODE encoding (i.e. the Chinese characters are still Chinese characters for example)?

I've tried many different ways without success:

1:

    std::string const argvString = boost::locale::conv::utf_to_utf<char>( argvW[0] )

2:

    int res;
    char buf[0x400];
    char* pbuf = buf;
    boost::shared_ptr<char[]> shared_pbuf;

    res = WideCharToMultiByte(CP_UTF8, 0, pcs, -1, buf, sizeof(buf), NULL, NULL);

3: Convert to QString first, then convert to UTF-8.

ETID: Problem solved. The UTF-16 wide character to UTF-8 char conversion actually works fine without problem with all these three approaches. And in Visual Studio, in order to correctly view the UTF-8 string in debug, it's necessary to append the s8 format specifier after the watched variable name (see: https://msdn.microsoft.com/en-us/library/75w45ekt.aspx). This is the part that I overlooked and made me think that my string conversion was wrong.

The real issue here is actually when calling QCoreApplication.arguments(), the returned QString is constructed by QString::fromLocal8Bit(), which would cause encoding issues on Windows when the command line arguments contain unicode characters. The workaround is whenever necessary to retrieve the command line arguments on Windows, always call the Windows API CommandLineToArgvW(), and convert the 16-bit UTF-16 wchar_t * (or LPWSTR) to 8-bit UTF-8 char * (by one of the three ways mentioned above).

2
How do you later determine if your call to QCoreApplication is successful? That is, you say that you want "the Chinese characters are still Chinese characters". So how do you tell that they no longer are. Show us the code that, given an appropriate conversion function, you would expect to work. - Nicol Bolas
According to the documentation Qt will automatically use CommandLineToArgvW for you, unless you pass modified arguments to the QCoreApplication constructor. It does not state what exactly "modified" means, but presumably the intent is to just work for ordinary code that just blindly forwards the main arguments, but honor the client code's wish if there is any difference. See doc.qt.io/qt-5/qcoreapplication.html#arguments - Cheers and hth. - Alf
Possible duplicate of Windows unicode commandline argv - Dan Korn
WideCharToMultiByte(CP_UTF8, ... is the canonical way under Windows. You say it "fails". What's the return value, and what's the GetLastError() after that? - dxiv
See also stackoverflow.com/questions/148403/…. I'm not voting to close this because Qt might make a difference in the answer. - Mark Ransom

2 Answers

2
votes

You should be able to use QString's functions. For example

QString str = QString::fromUtf16((const ushort*)argvW[0]);
::MessageBoxW(0, (const wchar_t*)str.utf16(), 0, 0);

When using WideCharToMultiByte, pass zero for output buffer and output buffer's length. This will tell you how many characters you need for output buffer. For example:

const wchar_t* wbuf = argvW[0];
int len = WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, 0, 0, 0, 0);

std::string buf(len, 0);

WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, &buf[0], len,0,0);
QString utf8;
utf8 = QString::fromUtf8(buf.c_str());
::MessageBoxW(0, (const wchar_t*)utf8.utf16(), 0, 0);

The same information should be available in QCoreApplication::arguments. For example, run this code with Unicode argument and see the output:

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);
    QString filename = QString::fromUtf8("ελληνική.txt");
    QFile fout(filename);
    if (fout.open(QIODevice::WriteOnly | QIODevice::Text))
    {
        QTextStream oss(&fout);
        oss.setCodec("UTF-8");
        oss << filename << "\n";
        QStringList list = a.arguments();
        for (int i = 0; i < list.count(); i++)
            oss << list[i] << "\n";
    }
    fout.close();
    return a.exec();
}

Note that in above example the filename is internally converted to UTF-16, that's done by Qt. WinAPI uses UTF-16, not UTF-8

2
votes

Qt internally wraps int main(), extracting and parsing the Unicode command line arguments (via CommandLineToArgvW) before any of your code is executed. The resulting parsed data is converted to the local UTF-8 format as char **argv via the equivalent of QString::toLocal8Bit().

Use QCoreApplication::arguments() to retrieve the Unicode args. Also, a helpful note from the docs:

On Windows, the list is built from the argc and argv parameters only if modified argv/argc parameters are passed to the constructor. In that case, encoding problems might occur.