0
votes

Apologies if this has been answered somewhere, but if it has, I couldn't find it.

I am doing some manipulations with byte arrays, and have noticed that when converting invalid chars (for example, the byte 0x9C) , it gets interpreted as a "?". Therefore, when I convert it back into a byte, it comes out as 0x3F.

My issue is that parts of the byte arrays are ASCII, but others are checksums that may contain invalid chars like this one. I would like to be able to convert the entire array into a string for convenience. Is there an encoding that will yield ASCII for normal characters and ensure that converting an invalid char to a string and back will yield the same byte?

2

2 Answers

0
votes

May not work for others, but I found that by using

System.Text.Encoding.Default.GetString(...)

and

System.Text.Encoding.Default.GetBytes(...)

as opposed to other encodings prevented the values in byte arrays from being changed to "?" and ASCII characters were still interpreted correctly.

0
votes

Not sure what you mean by "normal characters" but you are asking for an encoding that can decode arbitrary sequences of arbitrary byte values 0-255. It would need to be for a character set with 256 codepoints, have 1-byte code units, encode all codepoints in one code unit, be in the .NET Base Class Libraries and the character set be a subset of Unicode.

ISO 8859-1 and CP437 are two that meet these requirements. You can check whether they map your "normal characters" to "normal characters" in Unicode. (Hint: ISO 8859-1 has the all same characters as the C0 Control and Basic Latin and the C1 Controls and Latin-1 Supplement blocks.)

BTW—are you sure that regions of your data format are text encoded in ASCII and not some other character encoding?