3
votes

I have a large dataset in SAS that I need to change to wide from long.

Basically, the data is sorted with multiple observations for each property for about 10 different years with around 150 variables. I would like to make it so each variable is a different year and have one observation per property. Each variable is uniquely named and I want to keep the name in it.

What I have:

Property Year Var1 ... Var150

Prop 1 2010 100 ... ABC

Prop 1 2011 101 ... DEF

.

.

Prop 1 2017 138 ... XYZ

Prop2 ...

What I want:

Property Var1_2010 ... Var1_2017 ... Var150_2010 ... Var150_2017

Prop1 100 ... 138 ... ABC ... XYZ

Prop2 ...

I tried using Proc Transpose, but I cannot find a way to concatenate the year at the end of each original variable name.

I also tried:

proc sort data=hotels;
by propertyID year;
run;

proc transpose data=hotels out=hotels_wide;
by propertyID year;
var _all_;
run;

proc transpose data=hotels_wide delimiter=_ out=hotels_wide_full(drop=_name_);
by propertyID;
var col1;
id _name_ year;
run;

But the last block gave me an error telling me there are multiple occurrences of each variable for each propertyID. A shortcoming of this would also be losing the formats of each variable.

1
Your posted code will work. Can you please provide example data that does not work?Tom
The only reason the posted code would generate that error would be if there are existing variables with names that a so long that adding the extra characters to store the year value will not fit into the 32 character limit for variable names.Tom

1 Answers

1
votes

I think PROC TRANSPOSE does work here. This seems to work for me but it's possible I'm missing something about your code.

I first transpose it to a longer format so you have a single column of data with your year and property ID. Then I transpose it again to the wide format desired.

As long as you only have one row per year per property this works. If it's erroring out, then for some reason SAS thinks you have multiple years for variables which is problematic because you can't have Var4_2010 and var4_2010 twice so how should the second iteration be named? In that case you either summarize and then transpose or you need to add another variable to uniquely identify your rows.

data propertyData;
    array _var(*) var1-var5;

    do property=1 to 5;

        do year=2010 to 2015;

            do i=1 to dim(_var);
                _var(i)=rand('normal', 100, 5);
            end;
            output;
        end;
    end;
    keep property year var1-var5;
run;

proc transpose data=propertyData out=long;
by property year;
var var1-var5;
run;

proc transpose data=long out=wide delimiter = _;
by property;
id  _name_ year;
var col1;
run;