1
votes

I am trying to create a flat file for a legacy system and they mandates that the data to be presented in TextEncoding of MS DOS .txt file (Text Document - MS-DOS Format CP_OEM). I am a bit confused between files generated by using UTF8Encoding class in C# (.net4.0 framework) and I think it produce a file in default txt file (Encoding: CP_ACP).

I think Encoding names CP_ACP , Winodows and ANSI refers to same thing and Windows default is ANSI and it will omit any unicode character information.

If I use UTF8Encoding class in C# library to create a text file(as below), is it going to be in the MS DOS txt file format?

byte[] title = new UTF8Encoding(true).GetBytes("New Text File");

As per the answer supplied it is evident that UTF8 is NOT equivalent to MSDOS txt format and should use Encoding.GetEncoding(850) method to get the encoding library.

I read the following posts to check on my information but nothing conclusive yet. https://blogs.msdn.microsoft.com/oldnewthing/20120220-00?p=8273

https://blog.mh-nexus.de/2015/01/character-encoding-confusion

https://blogs.msdn.microsoft.com/oldnewthing/20090115-00?p=19483

Finally the conclusion is to go with Encoding.GetEncoding(850) when creating a byte array to be converted back to the actual file(note: i am using byte array as i can leverage existing middle wares).

1
Yes its useful to know the code page for AUS and just another quick one slightly unrelated, Is Tab char be the delimiter be used with these flat file types(in general i mean for allowing legacy systems to process)?Lin
Why there is a negative vote on this?Lin

1 Answers

3
votes

You can use the File.ReadXY(String, Encoding) and File.WriteXY(String, String[], Encoding) methods, where XY is either AllLines, Lines or AllText working with string[], IEnumerable<string> and string respectively.

MS-DOS uses different code pages. Probably the code page 850 "Western European / Latin-1" or code page 437 "OEM-US / OEM / PC-8 / DOS Latin US" (as @HansPassant suggests) will be okay. If you are not sure, which code page you need, create example files containing letters like ä, ö, ü, é, è, ê, ç, à or greek letters with the legacy system and see whether they work. If you don't use such letters or other special characters, then the code page is not very critical.

File.WriteAllText(path, "Hello World", Encoding.GetEncoding(850));

The character codes from 0 to 127 (7-bit) are the same for all MS-DOS code pages, for ANSI and UTF-8. UTF files are sometimes introduced with a BOM (byte order mark).

MS-DOS knows only 8-bit characters. The codes 128 to 255 differ for the different national code pages.

See: File Class, Encoding Class and Wikipedia: Code Page.