I am parsing a big XML file ~500MB, and it contains some invalid XML character 0x07
, so you can imagine what's happening, the XMLReader is throwing an Invalid XML character exception, to handle this we streamed the Stream
into StreamReader
and used Regex.Replace
and wrote the result to memory using StreamWriter
and stream the clean version back to XMLReader
, now I would like to avoid this and skip this filthy tag from the XMLReader directly, my question is if there's anyway to achieve that, below is the code snippet where I try to do this but it's throwing the exception at this line
var node = (XElement)XNode.ReadFrom(xr);
protected override IEnumerable<XElement> StreamReader(Stream stream, string elementName)
{
var arrTag = elementName.Split('|').ToList();
using (var xr = XmlReader.Create(stream, new XmlReaderSettings { CheckCharacters = false }))
{
while (xr.Read())
{
if (xr.NodeType == XmlNodeType.Element && arrTag.Contains(xr.Name))
{
var node = (XElement)XNode.ReadFrom(xr);
node.ReplaceWith(node.Elements().Where(e => e.Name != "DaylightSaveInfo"));
yield return node;
}
}
xr.Close();
}
}
XML SAMPLE, the invalid attribute DaylightSaveInfo
<?xml version="1.0" encoding="ISO-8859-1"?>
<LATree>
<LA className="BTT00NE" fdn="NE=9739">
<attr name="fdn">NE=9739</attr>
<attr name="IP">10.157.144.100</attr>
<attr name="realLatitude">0D0'0"S</attr>
<attr name="realLongitude">0D0'0"W</attr>
<attr name="DaylightSaveInfo">NO</attr>
</LA>
</LATree>
var xml = "<?xml version=\"1.0\"?><root>\a</root>"
reproduces the issue. You need to clean up your stream, there is no way around it. – Leonardo Herrera