1
votes

I have a dataset as following

AGE GENDER
11   F
12   M
13
15

now I want to create a dataset as following

Basically I want to have the variable names in another column.

or may be in one column like

VAR Value
AGE  11
AGE  12   
AGE  13
AGE  15
GENDER F
GENDER M 

I have tried normal proc transpose, but looks like it doesnt give the desired result.

2

2 Answers

2
votes

This is not a strictly speaking a transpose. Transpose implies that you want to transform some columns into rows or vice-versa, which is not the case here. That sample data transposed would look like:

VAR    VALUE1 VALUE2 VALUE3 VALUE4
----------------------------------
AGE    11     12     13     14
GENDER F      M

What you're trying to do here instead is have all your variables in the same column and add a 'label' column.

You could have your desired result with a data step:

data have;
infile datalines missover
;
input age $ gender $;
datalines;
11 F
12 M
13
15
;
run;

data want;
length var $6;
set have(keep=age rename=(age=value) in=a)
    have(keep=gender rename=(gender=value) where=(value is not missing) in=b);
if b then var='GENDER';
else if a then var='AGE';
run;

Note the where= dataset option on the second part of the set statement since your desired result does not include the missing values that you have for gender in your sample data.

Alternatively, you could do it with two proc transpose:

proc transpose data=have out=temp name=VAR;
var age gender;
run;

proc transpose data=temp out=want(drop=_name_ rename=(col1=VALUE) where=(VALUE is not missing));
var col1 col2 col3 col4;
by var;
run;
1
votes

One solution is to introduce a new unique row identifier and use that in a BY statement. This will let TRANSPOSE pivot the data values in each row.

data have;
rownum + 1; * new variable for pivoting by row via BY statement;
input AGE GENDER $;
datalines;
11   F
12   M
13   . 
15   .
run;

proc transpose data=have out=want(drop=_name_ rename=(col1=value) where=(value ne ''));
by rownum;
var age gender;
run;

In Proc TRANPOSE the default new column names are prefixed with COL and indexed by the number of occurrences of a value 1..n in the incoming rows. The artificial rownum and BY statement ensure the pivoted data has only one data column. Note: the prefix can be specified with option PREFIX=, and additionally the pivoted data column names can come from the data itself if you use the ID statement.

Mixed data types can be a problem because the new column will use character representation of underlying data values. So dates will come out as numbers and numeric that were initially formatted will lose their format.

If you are trying to make a JSON transmission I would recommend researching the JSON library engine or the JSON package of Proc DS2.

If you are looking to create a report with the data in this transposed shape I would recommend Proc TABULATE.