0
votes

I would like to know how I can get this results using SAS code. I have the following dataset:

id  2015  2016  2017
1   £10   £12   £11
2   £12   £14   £13
3   £11   £20   £10
4   £10   £13   £21
5   £15   £11   £18

and I want to compute the mean as follows:

id  2015  2016  2017
1   £10   £12   £11
2   £12   £14   £13
3   £11   £20   £10
4   £10   £13   £21
5   £15   £11   £18
Mean  £11.6   £14   £14.6

I thought of using proc means for do this:

proc means data=work.dataset1;
output out=work.dataout mean= /autoname;
var amt; 
class id;
run;

Now, I would need to compute the difference between periods. I am having difficulties in grouping by variable and getting the mean as in the table above.

Your help will be greatly appreciated. Thanks

2
Is that first picture really how you have the data? You cannot normally use numbers as variable names. It might be more flexible to store the data as a separate observation per id per year. So your dataset would have just three variables ID, YEAR and COST (or whatever name works for what you are measuring). Then it can work for any number of years.Tom
Thanks Tom. It was a sample of dataset to analyse, only to better explain the steps. My difficulties were in grouping by variables in order to get the mean for each time period. Right now I have multiple dates for single account (id). And I am computing the difference between the current value and the previous one using lagstill_learning

2 Answers

1
votes

You don't want to use class id. The procedure would compute the mean for each id group, and as such each id has only one row.

You show var amt, but the data set shown has an amount column for each year. Do you have columns amt2015, amt2016, amt2017 ?

You will need to use the proper syntax of Proc MEANS. Example:

data have; input 
id amt2015-amt2017; datalines;
1   10   12   11
2   12   14   13
3   11   20   10
4   10   13   21
5   15   11   18
run;

proc means noprint data=have;
  var amt2015-amt2017;
  output out=means mean=amt2015-amt2017 / autoname;
run;

data means;
  set means;
  meandiff_1_2 = amt2015-amt2016;
  meandiff_2_3 = amt2016-amt2017;
run;

proc print data=means; run;

-------- listing -------- 
                                                            meandiff_    meandiff_
Obs    _TYPE_    _FREQ_    amt2015    amt2016    amt2017       1_2          2_3

 1        0         5        11.6        14        14.6        -2.4         -0.6
1
votes

Do you really have the data in the format you first show? If so what are the names of the variables? 2016 is not a valid variable name.

Let's assume you have the data in more normalized form (or can convert it to such).

data have ;
  input id $ @;
  do year=2015 to 2017 ;
    input amt @ ;
    output;
  end;
cards;
1  10 12 11
2  12 14 13
3  11 20 10
4  10 13 21
5  15 11 18
;

You can then use PROC MEANS to find the mean amt per time period.

proc summary data=have nway ;
  class year;
  var amt ;
  output out=means mean=mean_amt ;
run;

And a simple data step to calculate the difference between the time periods:

data want;
  set means;
  mean_dif = dif(mean_amt);
run;

Results:

Obs    year    _TYPE_    _FREQ_    mean_amt    mean_dif

 1     2015       1         5        11.6          .
 2     2016       1         5        14.0         2.4
 3     2017       1         5        14.6         0.6