0
votes

I don't know how to describe this question but here is an example. I have an initial dataset looks like this:

input  first second $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
...
;

I want an output dataset like this:

input  first second $;
cards;
1 "A,B,C,D"
2 "E,F"
3 "S,A"
4 "C"
5 "Y"
6 "II,UU,OO,N"
7 "G,H"
...
;

Both tables will have two columns. Unique value of range of the column "first" could be 1 to any number.

Can someone help me ?

2
See an example here and you can use the QUOTE() function to add the quotes afterwards if you really need it. gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a - Reeza
I thought this would be a duplicate question for sure but I can't find any good dupes. - Robert Penridge

2 Answers

1
votes

something like below

 proc sort data=have;
 by first second;
 run;
data want(rename=(b=second));
 length new_second $50.;
do until(last.first);
set have;
by first second ;
new_second =catx(',', new_second, second);
b=quote(strip(new_second));
end;
drop second new_second;
run;

output is

 first  second
 1         "A,B,C,D"
 2         "E,F"
 3          "A,S"
 4           "C"
 5          "Y"
 6         "II,N,OO,UU"
 7         "G,H"
1
votes

You can use by-group processing and the retain function to achieve this.

Create a sample dataset:

data have;
input id value $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
;
run;

First ensure that your dataset is sorted by your id variable:

proc sort data=have;
  by id;
run;

Then use the first. and last. notation to identify when the id variable is changing or about to change. The retain statement tells the datastep to keep the value within concatenated_value over observations rather than resetting it to a blank value. Use the quote() function to apply the " chars around the result before outputting the record. Use the cats() function to perform the actual concatenation and separate the records with a ,.

data want;
  length contatenated_value $500.;
  set have;
  by id;
  retain contatenated_value ;

  if first.id then do;
    contatenated_value = '';
  end;

  contatenated_value = catx(',', contatenated_value, value);

  if last.id then do;
    contatenated_value = quote(cats(contatenated_value));
    output;
  end;

  drop value;
run;

Output:

contatenated_
    value        id

"A,B,C,D"         1
"E,F"             2
"S,A"             3
"C"               4
"Y"               5
"II,UU,OO,N"      6
"G,H"             7