0
votes

Consider following exemplary SAS dataset with following layout.

Price  Num_items  
100    10   
120    15  
130    20  
140    25  
150    30  

I want to group them into 4 categories by defining a new variable called cat such that the new dataset looks as follows:

Price  Num_items  Cat  
100    10         1  
120    15         1  
130    20         2  
140    25         3  
150    30         4  

Also I want to group them so that they have about equal number of items (For example in above grouping Group 1 has 25, Group 2 has 20 ,Group 3 has 25 and Group 4 has 30 observations). Note that the price column is sorted in ascending order (that is required).

I am struggling to start with SAS for above. So any help would be appreciated. I am not looking for a complete solution but pointers towards preparing a solution would help.

1
You should consider adding the code that you have attempted to solve this problem with. Also, can you clarify what you mean by "I want to group them into 2 categories" and "I want 20 equally sized groups?" Regardless, you will probably want to use the automatic variable _N_ which is essentially a counter variable. - J_Lard
I changed the wording a little bit. The idea is to partition the sorted dataset such that each partition has similar sum(Num_items) in it. - user123884
Ah, OK. You should look into the retain statement. This will let you sum over portions of the Num_items column. Then you can say something like if groupTotal >= 25 then cat + 1. The logic is not perfect but it should get you started. - J_Lard
Thanks!. I will start with that. - user123884

1 Answers

1
votes

Cool problem, subtly complex. I agree with @J_Lard that a data step with some retainment would likely be the quickest way to accomplish this. If I understand your problem correctly, I think the code below would give you some ideas as to how you want to solve it. Note that depending on the num_items, and group_target, your mileage will vary.

Generate similar, but larger data set.

data have;
    do price=50 to 250 by 10;
        /*Seed is `_N_` so we'll see the same random item count.*/
        num_items = ceil(ranuni(_N_)*10)*5;
        output;
    end;
run;

Categorize.

/*Desired group size specification.*/
%let group_target = 50;

data want;
    set have;
    /*The first record, initialize `cat` and `cat_num_items` to 1 with implicit retainment*/
    if _N_=1 then do;
        cat + 1;
        cat_num_items + num_items;
    end;
    else do;
        /*If the item count for a new price puts the category count above the target, apply logic.*/
        if cat_num_items + num_items > &group_target. then do;
            /*If placing the item into a new category puts the current cat count closer to the `group_target` than would keeping it, then put into new category.*/
            if abs(&group_target. - cat_num_items) < abs(&group_target. - (cat_num_items+num_items)) then do;
                cat+1;
                cat_num_items = num_items;
            end;
            /*Otherwise keep it in the currnet category and increment category count.*/
            else cat_num_items + num_items;
        end;
        /*Otherwise keep the item count in the current category and increment category count.*/
        else cat_num_items + num_items;
    end;
    drop cat_num_items;
run;

Check.

proc sql;
    create table check_want as
    select  cat,
            sum(num_items) as cat_count
    from want
    group by cat;
quit;