0
votes

I'm using the following PROC SQL step to pull data:

PROC SQL;  

create table panel as  
select ID, Month, Var1, Var2, Var3  
from data  
order by ID, Month;  
quit;  

I want to use the data to build a balanced panel data set, but there will be IDs missing, which means the value for each variable for each month should equal zero.

I cannot figure out how I can write a query or any data steps that will insert the missing IDs into the data set for each month and then give zeros as values.

For example, my query will make the following table:
UNBALANCED PANEL

My problem is that there is an ID "A" that is not represented in the data that I'm pulling, but ID "A" does exist. Also, to add complexity, ID "C" appears in the PROC SQL intermittently rather than on a monthly basis, but I would like to show it as zeros for each month it does not appear in the database. Therefore, I'm trying to have any missing data for known IDs appear for each month and with zeroes for each Var.

For example:
BALANCED PANEL

This has been stumping me for a few weeks and if anyone has any insights then it would be greatly appreciated!

1
Instead of asking all of us to lookup what a balanced panel data set is, you might shortly define itDirk Horsten

1 Answers

0
votes

This will not be the most elegant solution but it uses basic code that's easy to understand:

1) Have a dataset will all known IDs and Months

data ids;
infile datalines;
input ID $;
month='Jan'; output;
month='Feb'; output;
month='Mar'; output;
month='Apr'; output;
month='May'; output;
month='Jun'; output;
month='Jul'; output;
month='Aug'; output;
month='Sep'; output;
month='Oct'; output;
month='Nov'; output;
month='Dec'; output;
datalines;
A
B
C
D
;
run;

(this example is static as I don't know your data but if you can pull them from somewhere, e.g. select distinct ID, Month from table, it's of course much better)

2) Do your proc sql as you did:

proc sql;
create table panel as  
select ID, Month, Var1, Var2, Var3  
from data  
order by ID, Month;  
quit;

3) Then right join your result with the "table of zeroes to get those records for the 'missing IDs'

proc sql;
create table panel_balanced as
select coalesce(t1.ID,t2.ID) as ID
      ,coalesce(t1.Month,t2.Month) as Month
      ,coalesce(t1.var1,0) as var1
      ,coalesce(t1.var2,0) as var2
      ,coalesce(t1.var3,0) as var3
from panel t1
right join ids t2
  on t1.ID=t2.ID
  and t1.Month=t2.Month
;
quit;

You can of course combine step 2 and 3 into one query or even do the whole thing in one SQL query if the table from step 1 can also be created with SQL.