0
votes

I would like to count parts of the dataset.

This is my dataset:

YEAR ㅣ FIRMCode ㅣ FIRMName
2000 ㅣ 10 ㅣ 1
2001 ㅣ 11 ㅣ 1
.
.
2020 ㅣ 17 ㅣ 1
2000 ㅣ 11 ㅣ 2
.
.
2020 ㅣ 16 ㅣ 2

I want to count the number of types of firm codes each year, regardless of the firm name. The firm codes are from 10 to 20. So my output would look like:

YEAR ㅣ FIRMCode(10) ㅣ FIRMCode(11) ... ㅣ FIRMCode(20)
2000 ㅣ #firms with code10 ㅣ #firms with code11
.
.
2020 ㅣ #firms with code10 ㅣ #firms(11)

Thank you so much in advance.

1

1 Answers

0
votes

One way is to use SQL to compute count (distinct FIRMName) within a group and then Proc REPORT or Proc TABULATE to present the distinct counts as columns.

Your data does not appear to model a scenario in which a firm can have multiple codes within a year; if so, the need for distinct is not specifically needed.

Example:

Compute the number of firms with a code within a year, maintaining a categorical data structure. Present the counts in a wide layout presentation. Also, show a data transformation from two level category into a wide data set (generally not recommended)

data have;
  call streaminit (20210425);

  do firmname = 'a', 'b', 'c', 'd';
    do year = 2000 to 2010;
      code = 10 + rand('integer', 0, 5);
      output;
    end;
  end;
run;

proc sql;
  create table counts as
  select year, code, count(distinct firmname) as firm_ucount
  from have
  group by year, code
  order by year, code;

proc tabulate data=counts;
  class year code;
  var firm_ucount;
  table year, code * firm_ucount='' * mean='' * f=4.;
run;

proc report data=counts;
  columns year firm_ucount,code;
  define year / group;
  define code / '' across;
  define firm_ucount / '#firms with code';
run;

proc transpose data=counts out=want prefix=n_code_;
  by year;
  id code;
  var firm_ucount;
run;

TABULATE, count class in column-dimension

enter image description here

REPORT, count variable ACROSS

enter image description here

TRANSPOSE, code as part of column name

enter image description here