0
votes

I need to create a variable which combined three variables age , sexe and agios. All of them are located in a sas table and all of them are numeric.

In first step, I transform them into categorical variables then I suggest to use the compress function to create this new variable:

data new;
set new;

attrib sexe format=$15.;
if sexe=1 then sexe="HOMME";
else if sexe=0 then sexe="FEMME";

attrib agios format=$15.;
if agios=0 then agios="NON_AGIOS";
else AGIOS="AGIOS";

attrib age format=$15.;
if (age<=0) and (age=>25) then age="a25";
if (age<=26) and (age=>40) then age="a40";
if (age<=41) and (age=>65) then age="a60";
if (age=>65) then age="a65";
new_variable=compress(agios||sexe||age);
run; 

  But I have an warnings repeted to all the concerned variables:

WARNING: Variable agios has already been defined as numeric.

And there wasn't any transformation for the variables in the table. Also, values of the concerned variables age , sexe and agios disappear from the table. It begins empty.

How do I correct this please? Or is there any other suggestion to create the new variable without using ifelse?

Thank you

2

2 Answers

2
votes

This issue is in this line:

if agios=0 then agios="NON_AGIOS";

The compiler initialises agios to be a numeric variable. You cannot then assign it a character value. One way around this is to create a new variable (agios2) then rename it, eg:

data new;
set new(rename=(agios=agios2 sexe=sexe2 age=age2)); /* rename vars on input */

attrib sexe format=$15.;
if sexe2=1 then sexe="HOMME";
else if sexe2=0 then sexe="FEMME";

attrib agios format=$15.;
if agios2=0 then agios="NON_AGIOS"; /* reference the renamed variable */
else AGIOS="AGIOS";
drop agios2 age2 sexe2;             /* drop the renamed variables on output */

attrib age format=$15.;
if (age2<=0) and (age2=>25) then age="a25";
if (age2<=26) and (age2=>40) then age="a40";
if (age2<=41) and (age2=>65) then age="a60";
if (age2=>65) then age="a65";
new_variable=compress(agios||sexe||age);
run; 
1
votes

Another approach would be to create, for each variable, a sequenced format with the (NOTSORTED) option. The formats are used when reporting. For example in Proc TABULATE the FORMAT statement and CLASS statement with option order=formatted preloadfmt.

The age categorizations in your posted code appears incorrect

if (age<=0) and (age=>25) then age="a25";

The age value can not be simultaneously <= 0 and >= 25.

When labeling ranges I would recommend using values that are consistent with regards to the range they represent. For example sometimes you use the endpoint value in the label (a25, a40) and other times a middling point (a60). The scheme is inconsistent.

Here is sample code that applies formats to original data when being output in Proc TABULATE.

proc format;
  value sexe (notsorted)
    0 = 'HOMME'
    1 = 'FEMME'
  ;

  value agios (notsorted)
    0 = 'NON_AGIOS'
    1 = 'AGIOS'
  ;

  value age (notsorted)
     0< - 25 = 'a25'
    25< - 40 = 'a40'
    40< - 65 = 'a60' /* inconsistent labeling of range */
    65< - high = 'a65'
  ;

  * mock data;
  data have;
    do personid = 1 to 100;
      sexe = ranuni(123) < 0.40;
      agios = ranuni(123) > 0.65;
      age = floor(12 + 58 * ranuni(123));
      output;
    end;
  run;

  proc tabulate data=have;
    class sexe agios age / order=formatted preloadfmt;
    classlevel sexe agios age / style=[pretext='A0A0A0A0'x];
    format sexe sexe. agios agios. age age.;
    table
      ( sexe agios age ) 
    , N
    / nocellmerge
    ;
  run;

enter image description here

Sometimes creating new data with the formatted values becoming the actual data is appropriate, especially if handing off data to other users that may not have SAS and the formatting codes.

data ...;
  set ...;
  ...
  sexe_fv = put (sexe,sexe.);  * fv stands for formatted value;
  agios_fv = put (agios, agios.);
  age_fv = put (age, age.);
  ...
run;