0
votes

I have a question regarding SAS. We have a table crimes consisting of crimes (one crime per row with many more informations such as the article of the criminal law or the date).

Date        Article
——————————————-
2019-01-01  146
2019-01-01  122
2019-01-01  123
2019-01-01  123
2019-01-02  160
...

Every year we receive a questionnaire to fill out how many crimes have been committed in certain categories (e.g. fraud, homicides, etc.). Unfortunately, the categories in this questionnaire are not the same as our own categories (articles of the criminal law). So we have to define, which articles fall into which survey-category. These informations are stored in at table called definitions.

In the example below there are two articles (110, 111) that could be summarized as „theft“ and three articles that could be summarized as „assault“:

Category  Article  Query 
———————————————————————-
Theft     110      select count(*) from crimes where article = 110
Theft     111      select count(*) from crimes where article = 111
Robbery   130      select count(*) from crimes where article = 113
Assault   140      select count(*) from crimes where article = 140
Assault   141      select count(*) from crimes where article = 141
Assault   146      select count(*) from crimes where article = 146     

In our database crimes we also have a column „article“. The goal is now to read how many crimes there are.

As a result I would like to have a table like this, that gives me the number of crimes per category (the categories defined as in the table definitions):

Category    Count
———————————————————
Theft       10,038
Robbery        503
Assault      1,200

The question is now: What‘s the best way to get the desired result?

2

2 Answers

1
votes

I think, you can use this proc sql inner join:

proc sql;
   create table want as
   select d.category,count(*) as count
   from crimes c
   inner join definitions d
   on d.Article = c.Article
   group by 
   d.category;
quit;

It will work, if there is relation between article and category in definitions table and there is one row for one crime with defined article in crimes table.

1
votes

One way to do it is to convert your definitions table to a custom format that can be read into proc format, then sum your group using the custom format. This can make code much easier to work with later on. To make a custom and dynamic format, you only need the following variables: start, end, label, category, and fmtname.

If you forget what each variable represents, check out this paper, or simply make a dummy format and read the output table using the cntlout= option.

data fmt;
    length fmtname $32.
           label   $25.
    ;

    set definitions end=eof;

    fmtname = 'articlecat'; /* Format name */
    type    = 'C';          /* Format type. C = Character */
    start   = article;      /* Start lookup value */
    end     = article;      /* End lookup value */
    label   = category;     /* Custom format name to apply to start/end */
    output;

    /* Optional: Give a label for unknown articles using the special HLO variable */
    if(eof) then do;
        HLO = 'O'; /* Special variable representing 'other' */
        call missing(start, end);
        label = 'Unknown';
        output;
    end;

    keep start end fmtname type label hlo;
run;

/* Save the format */
proc format cntlin=fmt;
run;

Now you can use your format anywhere else in the program without having to do any joins to a definition file.

proc sql noprint;
    create table want as
        select put(category, $articlecat.) as Category
            , count(*) as Count
        from crimes
        group by calculated Category
    ;
quit;