1
votes

I built an XML object of type System.Xml.XmlDocument.

$scheme.gettype()
IsPublic IsSerial Name BaseType                                                         
-------- -------- ---- --------                                                         
True     False    XmlDocument System.Xml.XmlNode 

I use the method save() to save it to a file.

$scheme.save()

This saves the file in format UTF-8 with BOM. The BOM causes issues with other scripts down the line.

When we open the XML file in Notepad++ and save it as UTF-8 (without the BOM), other scripts down the line don't have a problem. So I've been asked to save the script without the BOM.

The MS documentation for the save method states:

The value of the encoding attribute is taken from the XmlDeclaration.Encoding property. If the XmlDocument does not have an XmlDeclaration, or if the XmlDeclaration does not have an encoding attribute, the saved document will not have one either.

The MS documentation on XmlDeclaration lists encoding properties of UTF-8, UTF-16 and others. It does not mention a BOM.

Does the XmlDeclaration have an encoding property that leaves out the BOM?

PS. This behavior is identical in Powershell 5 and Powershell 7.

2
Which overload of Save() are you calling? The latter half of the question deals with the encoding attribute of the <?xml declaration, yet the part about fixing the problem by resaving the file in Notepad++ suggests the real problem is the text encoding of the file itself. For that you can create a UTF-8 non-BOM StreamWriter with $encoding = New-Object -TypeName 'System.Text.UTF8Encoding' -ArgumentList $false; $writer = New-Object -TypeName 'System.IO.StreamWriter' -ArgumentList $outputPath, $shouldAppend, $encoding and pass that to Save().Lance U. Matthews

2 Answers

2
votes

As BACON explains in the comments, the string value of the Encoding attribute in the XML declaration doesn't have any bearing on how the file containing the document is encoded.

You can control this by creating either a StreamWriter or an XmlWriter with a non-BOM UTF8Encoding, then pass that to Save($writer):

$filename = Resolve-Path path\to\output.xml

# Create UTF8Encoding instance, sans BOM
$encoding = [System.Text.UTF8Encoding]::new($false)

# Create StreamWriter instance
$writer = [System.IO.StreamWriter]::new($filename, $false, $encoding)

# Save using (either) writer
$scheme.Save($writer)

# Dispose of writer
$writer.Dispose()

Alternatively use an [XmlWriter]:

# XmlWriter Example
$writer = [System.Xml.XmlWriter]::Create($filename, @{ Encoding = $encoding })

The second argument is an [XmlWriterSettings] object, through which we can exercise greater control over formatting options in addition to explicitly set encoding:

$settings = [System.Xml.XmlWriterSettings]@{
  Encoding = $encoding
  Indent = $true
  NewLineOnAttributes = $true
}
$writer = [System.Xml.XmlWriter]::Create($filename, $settings)

#  <?xml version="1.0" encoding="utf-8"?>
#  <Config>
#    <Group
#      name="PropertyGroup">
#      <Property
#        id="1"
#        value="Foo" />
#      <Property
#        id="2"
#        value="Bar"
#        exclude="false" />
#    </Group>
#  </Config>
2
votes

Unfortunately, the explicit presence of an encoding="utf-8" attribute in the declaration of an XML document causes .NET to .Save() the document to an UTF-8-encoded file with BOM if a target file path is given, which can indeed cause problems.

A request to change this was rejected for fear of breaking backward compatibility; here's a request to at least document the behavior.

Somewhat ironically, the absence of an encoding attribute causes .Save() to create UTF-8-encoded files without a BOM.

A simple solution is therefore to remove the encoding attribute[1]; e.g.:

# Create a sample XML document:
$xmlDoc = [xml] '<?xml version="1.0" encoding="utf-8"?><foo>bar</foo>'

# Remove the 'encoding' attribute from the declaration.
# Without this, the .Save() method below would create a UTF-8 file *with* BOM.
$xmlDoc.ChildNodes[0].Encoding = $null

# Now, saving produces a UTf-8 file *without* a BOM.
$xmlDoc.Save("$PWD/out.xml")

[1] This is safe to do, because the XML W3C Recommendation effectively mandates UTF-8 as the default in the absence of both a BOM and an encoding attribute.