0
votes

I'm currently struggling to open up a utf-16 encoded XML file with VBA in Excel.

My current String variable titled EntireFile currently begins like this:

ÿþ<?xml version="1.0" encoding="utf-16"?>
<Test xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 

As you can see there are some characters in the beginning that seem off.

I get the String variable by doing:

Open PathToFile For Input As #1
    Do Until EOF(1)
              Line Input #1, textline
              EntireFile = EntireFile & textline

File is formatted with UCS-2 Little Endian according to Notepad++ but a quick search trough the internet revealed that this is the Microsoft equivalent to UTF-16?

I tried the bruteforce method of just removing the first two characters but that leaves me with an empty string.

All of the google results cover saving a XML file without the BOM but thats kind of the opposite of what I'm looking for.

Thanks already for your time

1

1 Answers

1
votes

You can use Win32 API Functions to convert encodings.

Private Declare Function WideCharToMultiByte Lib "kernel32.dll" ( _
                         ByVal CodePage As Long, _
                         ByVal dwFlags As Long, _
                         ByVal lpWideCharStr As Long, _
                         ByVal cchWideChar As Long, _
                         ByVal lpMultiByteStr As Long, _
                         ByVal cbMultiByte As Long, _
                         ByVal lpDefaultChar As Long, _
                         ByVal lpUsedDefaultChar As Long) As Long

Private Declare Function MultiByteToWideChar Lib "kernel32.dll" ( _
                         ByVal CodePage As Long, _
                         ByVal dwFlags As Long, _
                         ByVal lpMultiByteStr As Long, _
                         ByVal cbMultiByte As Long, _
                         ByVal lpWideCharStr As Long, _
                         ByVal cchWideChar As Long) As Long

Private Const CP_UTF16 As Long = 1200&

Private Function ConvertToUTF16(ByRef Source As String) As Byte()

    Dim Length As Long
    Dim Pointer As Long
    Dim Size As Long
    Dim Buffer() As Byte

    Length = Len(Source)
    Pointer = StrPtr(Source)
    Size = WideCharToMultiByte(CP_UTF16, 0, Pointer, Length, 0, 0, 0, 0)
    ReDim Buffer(0 To Size - 1)

    WideCharToMultiByte CP_UTF16, 0, Pointer, Length, VarPtr(Buffer(0)), _
        Size, 0, 0

    ConvertToUTF16 = Buffer

End Function

Private Function ConvertFromUTF16(ByRef Source() As Byte) As String

    Dim Size As Long
    Dim Pointer As Long
    Dim Length As Long
    Dim Buffer As String

    Size = UBound(Source) - LBound(Source) + 1
    Pointer = VarPtr(Source(LBound(Source)))
    Length = MultiByteToWideChar(CP_UTF16, 0, Pointer, Size, 0, 0)
    Buffer = Space$(Length)
    MultiByteToWideChar CP_UTF16, 0, Pointer, Size, StrPtr(Buffer), Length
    ConvertFromUTF16 = Buffer

End Function

Private Const CP_UTF16 As Long = 1200& means codepage 1200 which is UTF-16 little andian.

You can see a list of all codepages here https://msdn.microsoft.com/de-de/library/windows/desktop/dd317756(v=vs.85).aspx