3
votes

We have an old application in Turbo Pascal which can save its internal state into a file, and we need to be able to read/write this file in a C# application.

The old application generates the file by dumping various in-memory data structures. In one place, the application just dumps a range of memory, and this memory range contains some arrays. I am trying to noodle out the purpose of the bytes immediately preceding the actual array elements. In particular, the first two items in the block can be represented as:

type
  string2 = string[2];
  stringarr2 = array[0..64] of string2;
  string4 = string[4];
  stringarr4 = array[0..64] of string4;

In the data file, I see the following byte sequence:

25 00 02 02 41 42 02 43 44 ...

The 25 is the number of elements in the array. The 02 41 42 is the first string element, "AB"; the 02 43 44 is the second string element, "CD", and so on. I don't know what the 00 02 between the array element count and the first array element refers to. It's possible the array element count is 25 00 and the element size is 02, but each array element is actually 3 bytes in size.

In the place in the file where the array of 4-character strings starts, I see the following:

25 00 04 00 00 04 41 42 43 44 04 45 46 47 48

Again, there's the 25 which is the number of elements in the array; 04 41 42 43 44 is the first element in the array, "ABCD", and so on. In between there are the bytes 00 04 00 00. Maybe they are flags. Maybe they are some kind of indication of the shape of the array (but I don't see how 02 and 04 both indicate a one-dimensional array).

I don't have access to Turbo Pascal to try writing different kinds of arrays to a file, and don't have authorization to install something like Free Pascal, so my opportunities for experimentation along those lines are very limited.

These arrays are not dynamic, since Turbo Pascal didn't have them.

Thanks in advance for any dusty memories.

3
And can you post the piece of code that writes the data to the file.GolezTrol
I think those bytes may just be a custom header, by which the reader knows he's reading an array of 25 items of type '4' which apparently is an array of string4. The zero's before and after can only be guessed.GolezTrol
The application allocates a block of memory and causes pointers of these types (stringarr2, stringarr4 and others) to point into that block of memory. It doesn't write the arrays invdividually to the file; it writes the whole block at once (using BlockWrite). That's why I am curious what the in-memory layout of an array is.prprcupofcoffee
25 00 is probably a 2 byte integer. Why don't you read the source code that writes the files?David Heffernan

3 Answers

9
votes

Pascal arrays have no bookkeeping data. You have an array of five-byte data structures (string[4]), so an array of 65 of them occupies 65*5=325 bytes. If the program wrote more than that, then it's because the program took special measures to write more. The "extra" values weren't just sitting in memory that the program happened to write to disk when it naively wrote the whole data structure with SizeOf. Thus, the only way to know what those bytes mean is to find the source code or the documentation. Merely knowing that it's Turbo Pascal is no help.

It's possible that the first section of the file is intentionally the same size as all the other array elements. For the two-character strings, the "header" is three bytes, and for the four-character strings, the "header" is five bytes, the same as the size of the strings. That would have allowed the program to use a file of string4 data type for the file, and then just skip the file's first record. The zero between the file length and the string length in the header might belong to either of those fields, and the remaining two zero bytes might just be filler.

2
votes

Besides the layout of the individual strings of characters in the file, you will also need to consider what code page those single-byte characters are from. C# chars are unicode 2 byte chars.

If you're lucky, the original file data contains only ASCII 7 bit characters, which covers characters of the English alphabet. If the original data contains "European" letters such as umlauts or accented characters, these will be "high ascii" char values in the range 128..255. You'll need to perform an encoding conversion to see these characters correctly in C#. Code page 1252 Windows Latin 1 would be a good starting point.

If the original file data contains Japanese, Chinese, Korean, Thai, or characters from other "Eastern" scripts, you have a lot of work ahead of you.

1
votes

Turbo Pascal strings are prefixed with a length byte. So a string[2] is actually 3 bytes: length, char1 and char2. An array of string[2] will hold all the strings one by one directly after each other in memory. If you do a blockwrite with the array as a parameter it will immediately start with the first string, it will not write any headers etc. So if you have the source you should be able to see what it writes before the array.