0
votes

I have a file called test.txt which contains a single Chinese character, , in it.

This character looks like this

enter image description here

under hex-editor's view.

If I do get-content test.txt | Out-File test_output.txt, the content of test_output.txt is different from test.txt. Why is this hapenning?

I've tried all the encoding parameters listed here ("Unicode", "UTF7", "UTF8", "UTF32", "ASCII", "BigEndianUnicode", "Default", and "OEM"), but none of them correctly converts the Chinese character.

How can I correctly convert Chinese characters using Get-Content and Out-File?

The encoding, e4 b8 ad, looks like URLencode of , is this why all the encoding parameters are not compatible with this Chinese character?

I use Notepad++ and Notepad++'s hex-editor plugin as my text-editor and hex-editor, respectively.

2
What encoding is used in the file itself? That is, what BOM are there?vonPryz
The encoding is UTF-8 without BOM.Brian

2 Answers

0
votes

I tried get-content test.txt -encoding UTF8 | Out-File test_output.txt -encoding UTF8

My test.txt is "e4 b8 ad 0a". And the output is "ef bb bf e4 b8 ad 0d 0a"

test.txt is in UTF-8.

Get-Content doesn't recognize UTF-8 unless with BOM. Out-File uses UTF-16 by default.

So specifying encoding for both commands is necessary

0
votes

In my case, the Unicode encoding solved my problem with the Chinese characters. The file I was modifying contained a C# code on a TFS sever.

$path="test.cs"
Get-Content -Path $path -Encoding Unicode
Set-Content -Path $path -Encoding Unicode

it might help somebody else.