3
votes

I am using the code below to read a ~2.5Gb Xml file as fast as I can (thanks to MemoryMappedFile). However, I am getting the following exception: "'.', hexadecimal value 0x00, is an invalid character. Line 9778, position 73249406.". I beleive it is due to some encoding problem. How do I make sure that the MemoryMappedViewStream reads the file using UTF-8?

static void Main(string[] args)
{
    using (var file = MemoryMappedFile.CreateFromFile(@"d:\temp\temp.xml", FileMode.Open, "MyMemMapFile"))
    {
        using (MemoryMappedViewStream stream = file.CreateViewStream())
        {
            Read(stream);
        }
    }
}

static void Read(Stream stream)
{
    using (XmlReader reader = XmlReader.Create(stream))
    {
        reader.MoveToContent();

        while (reader.Read())
        {
        }
     }
 }
2
No, it is because you ran off the end of the mapping. You can only ever hope to map the full 2.5 gigabytes on a 64-bit operating system. This code doesn't accomplish anything, you are actually making it slower by copying the data twice. First to the file system cache, again to the view. Memory mapped files are only useful if you read the same data from them repeatedly. You don't.Hans Passant

2 Answers

4
votes

You could use the StreamReader class to set the encoding:

static void Main(string[] args)
{
  using (var file = MemoryMappedFile.CreateFromFile(@"d:\temp\temp.xml", FileMode.Open,  "MyMemMapFile"))
  {
     using (MemoryMappedViewStream stream = file.CreateViewStream())
    {
        Read(stream);
    }
   }
}

static void Read(Stream stream)
{
  using (XmlReader reader = XmlReader.Create(new StreamReader(stream, Encoding.UTF8)))
  {
     reader.MoveToContent();

    while (reader.Read())
    {
    }
 }
}

Hope, this helps.

0
votes

On MSDN you get the following.

"The XmlReader scans the first bytes of the stream looking for a byte order mark or other sign of encoding"

Does your xml file specify an encoding?

<?xml version="1.0" encoding="UTF-8"?>