multiples of 8 - optimal length for SAS character variables?

Question

I heard that SAS stores character variables in chunks of 8 bytes.

Therefore, the thinking goes we should always assign the length of the character variables to be a multiple of 8.

I have searched and could not find any support for the initial assertion.

Is it true? Is this covered somewhere in the documentation?

@Joe, the question is not about variable names, but variable lengths, as set by the length statement on a data set. — jaamor
Then your question makes even less sense; but you can verify this easily on your own, can you not? — Joe
True, I can verify empirically. I was hoping that someone who knows would share their insight on how SAS stores data. — jaamor
If you wanted to know the answer to the above, you should've just ... tested it. If you want to know something about how SAS stores data, ask that question. — Joe
I did my own empirical test and added it below as an answer. Now people can look this question up. — jaamor

Joe Joe · Accepted Answer · 2015-10-20T16:07:04

This is true for datasets that contain no 8 byte numeric variables. I will post separately for datasets that do.

No, there is nothing special about 8 byte character variable lengths.

See the below:

data length8;
  length char0001-char9999 $8;
  call missing(of _all_);
  do _i = 1 to 100; 
    output;
  end;
  drop _i;
run;
data length7;
  length char0001-char9999 $7;
  call missing(of _all_);
  do _i = 1 to 100; 
    output;
  end;
  drop _i;
run;

data length4;
  length char0001-char9999 $4;
  call missing(of _all_);
  do _i = 1 to 100; 
    output;
  end;
  drop _i;
run;

data length12;
  length char0001-char9999 $12;
  call missing(of _all_);
  do _i = 1 to 100; 
    output;
  end;
  drop _i;
run;

data length16;
  length char0001-char9999 $16;
  call missing(of _all_);
  do _i = 1 to 100; 
    output;
  end;
  drop _i;
run;

data length17;
  length char0001-char9999 $17;
  call missing(of _all_);
  do _i = 1 to 100; 
    output;
  end;
  drop _i;
run;

Each of these datasets is of different size, roughly proportional to the length of the character variables. Note that the 4 size is a bit bigger proportionally (on my machine, anyway): in fact, 4,5,6 are all the same size. This is because of the page size: the minimum page size on my installation is 64kb (65535 bytes), and 4,5,6 all can only fit one row of data in that (roughly 40, 50, and 60kb rows). It's not because of any particular size being saved for a character variable, but instead because of the total length of the data record.

That's where you could potentially have a savings by altering a small amount: if your data happen to be arranged such that the page size is just under double the size of the row, then making the row just slightly smaller will save you half of the space. That's unlikely to occur except on a very small number of cases though - it requires a very wide row (many variables, or very long character variables). You also can alter the page size with options, though, which may be the better way to deal with edge cases like this.

multiples of 8 - optimal length for SAS character variables?

3 Answers