3
votes

I have the following code:

using (BinaryReader br = new BinaryReader(
       File.Open(FILE_PATH, FileMode.Open, FileAccess.ReadWrite)))
{
    int pos = 0;
    int length = (int) br.BaseStream.Length;

    while (pos < length)
    {
        b[pos] = br.ReadByte();
        pos++;
    }

    pos = 0;
    while (pos < length)
    {
        Console.WriteLine(Convert.ToString(b[pos]));
        pos++;
    }
}

The FILE_PATH is a const string that contains the path to the binary file being read. The binary file is a mixture of integers and characters. The integers are 1 bytes each and each character is written to the file as 2 bytes.

For example, the file has the following data :

1HELLO HOW ARE YOU45YOU ARE LOOKING GREAT //and so on

Please note: Each integer is associated with the string of characters following it. So 1 is associated with "HELLO HOW ARE YOU" and 45 with "YOU ARE LOOKING GREAT" and so on.

Now the binary is written (I do not know why but I have to live with this) such that '1' will take only 1 byte while 'H' (and other characters) take 2 bytes each.

So here is what the file actually contains:

0100480045..and so on Heres the breakdown:

01 is the first byte for the integer 1 0048 are the 2 bytes for 'H' (H is 48 in Hex) 0045 are the 2 bytes for 'E' (E = 0x45)

and so on.. I want my Console to print human readable format out of this file: That I want it to print "1 HELLO HOW ARE YOU" and then "45 YOU ARE LOOKING GREAT" and so on...

Is what I am doing correct? Is there an easier/efficient way? My line Console.WriteLine(Convert.ToString(b[pos])); does nothing but prints the integer value and not the actual character I want. It is OK for integers in the file but then how do I read out characters?

Any help would be much appreciated. Thanks

3
I deleted my answer - what was the person thinking who decided on that format? :boggled:Sam Harwell
How is the integer field differentiated from the string? Can the characters be above code point U+00FF? Can the integer be "0"? Is the integer signed or unsigned?outis
It looks like C#. Is it?Alfred Myers

3 Answers

8
votes

I think what you are looking for is Encoding.GetString.

Since your string data is composed of 2 byte characters, how you can get your string out is:

for (int i = 0; i < b.Length; i++)
{
  byte curByte = b[i];

  // Assuming that the first byte of a 2-byte character sequence will be 0
  if (curByte != 0)
  { 
    // This is a 1 byte number
    Console.WriteLine(Convert.ToString(curByte));
  }
  else
  { 
    // This is a 2 byte character. Print it out.
    Console.WriteLine(Encoding.Unicode.GetString(b, i, 2));

    // We consumed the next character as well, no need to deal with it
    //  in the next round of the loop.
    i++;
  }
}
2
votes

You can use String System.Text.UnicodeEncoding.GetString() which takes a byte[] array and produces a string.

I found this link very useful

Note that this is not the same as just blindly copying the bytes from the byte[] array into a hunk of memory and calling it a string. The GetString() method must validate the bytes and forbid invalid surrogates, for example.

0
votes
using (BinaryReader br = new BinaryReader(File.Open(FILE_PATH, FileMode.Open, FileAccess.ReadWrite)))
{    
   int length = (int)br.BaseStream.Length;    

   byte[] buffer = new byte[length * 2];
   int bufferPosition = 0;

   while (pos < length)    
   {        
       byte b = br.ReadByte();        
       if(b < 10)
       {
          buffer[bufferPosition] = 0;
          buffer[bufferPosition + 1] = b + 0x30;
          pos++;
       }
       else
       {
          buffer[bufferPosition] = b;
          buffer[bufferPosition + 1] = br.ReadByte();
          pos += 2;
       }
       bufferPosition += 2;       
   }    

   Console.WriteLine(System.Text.Encoding.Unicode.GetString(buffer, 0, bufferPosition));

}