0
votes

I am using the Unicode version of NSIS to make an installer. I will be appending lines to both ANSI and Unicode files. Before I write a line to a file I need to know whether the file is ANSI encoded or Unicode so I know if I should use FileWrite or FileWriteUTF16LE.

How can I find out the encoding type of a file?

The Unicode Plugin which can tell me the encoding of a file doesn't work for NSIS Unicode, the function unicode::UnicodeType always returns 6.

Any advice would be extremely helpful.

2

2 Answers

0
votes

If you want to continue using that plugin you could recompile it yourself as unicode or try the CallAnsiPlugin plugin.

You can also perform the check yourself:

!include LogicLib.nsh

!define ByHandleIsFileUTF16LE "'' ByHandleIsFileUTF16LE "
!macro _ByHandleIsFileUTF16LE a b t f
!insertmacro _LOGICLIB_TEMP
FileReadByte ${b} $_LOGICLIB_TEMP
IntCmpU $_LOGICLIB_TEMP 0xFF "" `${f}`
FileReadByte ${b} $_LOGICLIB_TEMP
IntCmpU $_LOGICLIB_TEMP 0xFE `${t}` `${f}`
!macroend
!define IsFileUTF16LE "'' IsFileUTF16LE "
!macro _IsFileUTF16LE a b t f
!insertmacro _LOGICLIB_TEMP
Push $0
FileOpen $0 "${b}" r
!define _IsFileUTF16LE _IsFileUTF16LE${__LINE__}
!insertmacro _ByHandleIsFileUTF16LE '' $0 ${_IsFileUTF16LE}t ${_IsFileUTF16LE}f
${_IsFileUTF16LE}f:
    StrCpy $_LOGICLIB_TEMP ""
${_IsFileUTF16LE}t:
!undef _IsFileUTF16LE
FileClose $0
Pop $0
StrCmp "" $_LOGICLIB_TEMP `${f}` `${t}`
!macroend



section

!macro testutf16detection file
${If} ${IsFileUTF16LE} "${file}"
    DetailPrint "${file} is UTF16LE"
${Else}
    DetailPrint "${file} is NOT UTF16LE"
${EndIf}
!macroend

!insertmacro testutf16detection "$temp\test1.txt"
!insertmacro testutf16detection "$temp\test2.txt"

sectionend
0
votes

One potential solution is to check for the BOM. Here's how you could check if a file uses the UTF16LE encoding:

!define fileIsUTF16LE "!insertmacro FileIsUTF16LE"
!macro FileIsUTF16LE file result
  Push $0
  Push $1
  FileOpen $0 "${file}" r
  FileReadByte $0 $1
  IntCmpU $1 0xFF "" FileIsUTF16LE_ItsNot FileIsUTF16LE_ItsNot
  FileReadByte $0 $1
  IntCmpU $1 0xFE FileIsUTF16LE_ItIs FileIsUTF16LE_ItsNot FileIsUTF16LE_ItsNot
  FileIsUTF16LE_ItIs:
    StrCpy ${result} 1
    Goto FileIsUTF16LE_Done
  FileIsUTF16LE_ItsNot:
    StrCpy ${result} 0
  FileIsUTF16LE_Done:
    FileClose $0
    Pop $1
    Pop $0
!macroend

Usage:

${fileIsUTF16LE} "$R0" $3
${If} $3 == 1

Note that this will not work in all cases since not all UTF encodings require a BOM. You could easily modify this macro to check for other BOMs, however, definitively determining encoding is non trivial. One method would be to check for all the different BOMs, if the file doesn't have a BOM, assume it's not unicode.