0
votes

Using 32 bit MASM assembly with the MASM version 11 SDK, I discovered an error during compiling. The error pointed to the line where I declared a variable with a double-word (dd) size. The message said the variable was too small for the string I tried to assign to it. When I defined my variable as a byte instead (db) the program was compiled with no error. This implied that declaring a variable with the db instruction could allow more storage than declaring a double-data size. Below is the code for the declaration of a double-word variable that the error message pointed to:

.data
msg_run dd "Ran a function.", 0

I changed the data size of msg_run to a byte:

.data
msg_run db "Ran a function.", 0

When I tried to compile with the second line, the program compiled and ran with no problems. Why did the error imply that a variable declared to be byte-sized has more capacity than a variable declared to be double-word-sized? Does the trailing " ,0" have any effect?

Sources I reviewed:

https://www.cs.virginia.edu/~evans/cs216/guides/x86.html https://www.shsu.edu/~csc_tjm/fall2003/cs272/intro_to_asm.html

1
A "string" is really just an array of characters terminated by a zero. Each character is a single byte (for narrow characters, char in C). With dd you make each element of the array a double word, i.e. each element is 32 bits, which isn't really correct.Some programmer dude
MASM treats strings (things between the quotes) in a special way when you use db. db is a single character (byte) so MASM will take each character and store it in a byte. This type of processing doesn't occur the same way with types larger than a byte (dw and dd). In those situations MASM tries to stuff your string into into a single DWORD (32-bit value). Look what happens if you use dd and make your string <=4 characters in length. The error should disappear but the characters are placed in memory in reverse order.Michael Petch

1 Answers

2
votes

Having a strict data definition syntax that requires the programmer to write each element separated by a comma would make declaring a string tedious:

myString db 'M', 'y', ' ', 's', 't', 'r', 'i', 'n', 'g', 0

so MASM (and all other mainstream assemblers) relaxes the syntax in

myString db "My string", 0

Note that I used quotes ' for characters (i.e. numbers) and double quotes " for strings, I don't know the exact syntax used by MASM and it will possibly convert 1-char string to char.

What you saw with the dd case looks very similar to the shorthand above but it is not a syntax to declare strings, in fact, it creates numbers.

When a string like "ABCD" is used where a number is expected (like in a dd or as an immediate) MASM converts it to 0x44434241. These are the value of the characters D, C, B, A.
The reversing is done because the syntax is mostly used for instruction immediates, like in mov eax, "ABCD" or cmp eax, "ABCD".
This way, storing eax to memory will create the string "ABCD" (in the correct order) thanks to the x86 endianness.
This also works great with checking the signatures of tables since these signatures are designed to spell correctly in memory but, of course, reversed once loaded in a register.

In NASM you can even piss everybody off with things like mov eax, ("ABCD" + "EFGH") / 2, reinforcing the view of these strings as numbers. This should also apply to MASM.

I don't remember a case where I've used myVar dd "ABCD" but it may be useful when a structure has a fixed string that is spelled reversed in memory.


Michael Petch recapped MASM behaviour in a comment:

MASM treats strings (things between the quotes) in a special way when you use db. db is a single character (byte) so MASM will take each character and store it in a byte. This type of processing doesn't occur the same way with types larger than a byte ( dw and dd). In those situations MASM tries to stuff your string into into a single DWORD (32-bit value). Look what happens if you use dd and make your string <=4 characters in length. The error should disappear but the characters are placed in memory in reverse order.