For datasets that contain a numeric variable, as @jaamor's example included, there is a difference that does have some impact on storage related to 8 byte size. It will not usually have a significant impact on dataset size, except on a very tall and narrow dataset, but for datasets that are very tall and narrow, it may be a consideration.
When a numeric variable that is 8 bytes (the default) in length, SAS places those numeric variables at the end of the data vector, and starts them at a multiple of 8 bytes, presumably to aid in efficiency at accessing those predictable numeric variables. Any other variable other than an 8 byte numeric will be placed at the start of the data vector, and then any padding needed to bring that up to a multiple of 8 bytes is added, and then the numeric 8 byte variables are placed after that.
This can be seen by looking at the proc contents
output from some example datasets.
data fourteen_eight;
length x y $7; *14 total;
length i 8;
run;
data twelve_eight;
length x y $6; *12 total;
length i 8;
run;
data twelve_six;
length x y $6; *12 total;
length i 6;
run;
data twelve_six_eight;
length x y $6;
length z 6;
length i 8;
run;
fourteen_eight
has a conceptual observation length of 22, but a physical observation length of 24 (looking at PROC CONTENTS
). twelve_eight
has a conceptional length of 20, but a physical observation length of 24 as well. twelve_six
has a conceptual length of 18, and a physical observation length of 18 - meaning no buffer if the numeric variable isn't 8 long. twelve_six_eight
has a conceptual length of 26, and a physical size of 32: 18 rounded up to 24, and then the 8 at the end. (You can verify it's not allocating 8 for each numeric variable by simply adding several more 6 byte numbers; they never increase the total padding, and fit neatly in a smaller space.)
Here's how it ends up looking:
would fit like so:
[00000000011111111112222222222333333333344444444445]
[12345678901234567890123456789012345678901234567890]
[xxxxxxyyyyyyzzzzzz iiiiiiii]
One side note: I'm not 100% sure that it's not [iiiiiiiixxxxxxyyyyyyzzzz ]. That would work just as well as far as being able to predict the location of numeric variables. It doesn't really affect this, though: either way, yes, there will be a small buffer if your total non-8-byte-numeric storage is not a multiple of 8 bytes if you do have one or more 8 byte numeric variables.