0
votes

Say that I have this MWE data:

data v;
input var1 $ var2 var3 $;
datalines;
cat 3 yes
sheep 2 no
sheep 3 maybe
pig 3 maybe
goat 3 maybe
cat 2 no
pig 1 no
cat 2 no
pig 1 no
goat 3 no
cat 3 no
cat 2 yes
cat 1 yes
sheep 3 no 
cat 2 no
cat 1 maybe
;
run;

I use proc tabulate to compute the number of observations for each value. I do this for each of the variables:

proc tabulate data=v;
    class var1;
    table (var1='' all="Total"),(N pctn);
quit;   


proc tabulate data=v;
    class var2;
    table (var2='' all="Total"),(N pctn);
quit;

proc tabulate data=v;
    class var3;
    table (var3='' all="Total"),(N pctn);
quit;

I get output that looks as follows:

       N        PctN
cat    8        50.00
goat   2        12.50
pig    3        18.75
sheep  3        18.75
Total  16       100.00

        N       PctN
1       4       25.00
2       5       31.25
3       7       43.75
Total   16      100.00

        N       PctN
maybe   4       25.00
no      9       56.25
yes     3       18.75
Total   16      100.00

My question is: How can I export this into Excel in the following format?:

Name    Cat 1   N1  N1% Cat 2   N2  N2%     Cat 3   N3  N3%     Cat 4   N4  N4%     Missing %   Total   Total%
var1    cat     8   50  goat    2   12.5    pig     3   18.75   sheep   3   18.75   0           16      100
var2    1       4   25  2       5   31.25   3       7   43.75                       0           16      100
var3    maybe   4   25  no      9   56.25   yes     3   18.75                       0           16      100

In other words, I want each different variable to have its own line. Each value of the variable will appear in this line, with the number of observations and % of total observations. The final three columns are a bonus, but not required: the % and number of missing observations and the total number of values for the variable. How can I do this?

Note that I am very novice at SAS. Any improvements in the code are also welcome, such as how to loop or condense the code to produce the tables.

1
That’s a bad structure for many reasons and you won’t find an easy way. For a single table I would recommend going down instead of across, them you can use a single PROC FREQ to get your data. Since the all column seems pointless, not sure why that’s included. If you want missing, proc freq will support that with the missing option. If you’re interested in that’s let me know and I’ll post the code.Reeza
If you really really need this structure, you could use my approach and transpose the data.Reeza
What do you want to do if you encounter a variable with more than 4 categories? What have you tried so far? Exporting to excel is simple, but getting the data into your desired shape is going to require lots of transposing and appending.user667489
If there were more than four categories, I would want additional columns. It would be no problem for the total columns could come at the first. I haven't really tried anything.bill999

1 Answers

1
votes

The desired data form is extremely messy and hard to consume as the number of variables and number of distinct values thereof increases.

These processing steps can be performed to achieve the output structure:

  1. Transpose each row
  2. Obtain frequency counts of each variable value combination
  3. Scan the counts by variable and construct a data set of name/value pairs for next step. Inject a row for the missing case as needed.
  4. Transpose the name/value pairs into a wide structure
  5. Export as desired

Example

Data has a fourth variable that has some missing values.

data have;
input var1 $ var2 var3 $ var4;
datalines;
cat 3 yes .
sheep 2 no .
sheep 3 maybe .
pig 3 maybe .
goat 3 maybe 1 
cat 2 no 1
pig 1 no 1
cat 2 no 1
pig 1 no 1
goat 3 no 1
cat 3 no 1
cat 2 yes 1
cat 1 yes 1
sheep 3 no 1
cat 2 no 2
cat 1 maybe 1
;
run;

options missing = ' ';
proc transpose data=have_v out=vector1(index=(_name_));
  by rowid;
  var var1 var2 var3 var4;
run;

proc freq noprint data=vector1;
  by _name_;
  table col1 / missing out=freqs;
run;
options missing = '.';

data freqs_0;
  set freqs;
  by _name_;

  retain nomiss;
  if first._name_ then nomiss = not missing(col1);

  if first._name_ then seq=1; else seq+1;

  seqc = cats(seq);

  if first._name_ and missing(col1) then do;
    seqc = 'missing';
    seq = 0;
  end;

  length widename $32;

  if seqc ne 'missing' then do;
    widename = cats("cat_",seqc);
    widevalue = col1;
    output;
  end;

  widename = cats("cat_",seqc,'_COUNT');
  widevalue = COUNT;
  output;

  widename = cats("cat_",seqc,'_PERCENT');
  widevalue = PERCENT;
  output;

  if last._name_ and nomiss then do;
    seqc = 'missing';

    widename = cats("cat_",seqc,'_COUNT');
    widevalue = 0;
    output;

    widename = cats("cat_",seqc,'_PERCENT');
    widevalue = 0;
    output;

  end;
  keep _name_ widename widevalue;
run;

proc transpose data=freqs_0 out=wide;
  by _name_;
  id widename;
  var widevalue;
run;