0
votes

I have an issue with finding out the missing months from the data set in SAS. Since I am new to SAS, I need some help on working on it. I have a data set which is shown as below: In the below example I took the date range from 201810 to 201906 (which is 8 months sample data but need to have 15 months to check the missing data). I want to do this in SAS

+----+------------+
| ID | Elig Month |
+----+------------+
|  1 |     201810 |
|  1 |     201811 |
|  1 |     201901 |
|  1 |     201902 |
|  1 |     201903 |
|  1 |     201904 |
|  1 |     201905 |
|  1 |     201906 |
|  2 |     201811 |
|  2 |     201901 |
|  2 |     201903 |
|  2 |     201904 |
|  2 |     201905 |
|  2 |     201906 |
|  3 |     201901 |
|  3 |     201902 |
|  3 |     201903 |
|  3 |     201904 |
|  3 |     201905 |
|  3 |     201906 |
|  4 |     201810 |
|  4 |     201903 |
|  4 |     201904 |
|  4 |     201905 |
|  4 |     201906 |
|  5 |     201906 |
|  6 |     201810 |
+----+------------+

I want to see if that data is present for all the months between 15 months date range. I have date format as 201901 (yearmonth). I want to check if the data is missing and create groups based on the missing number of months say 1. if only one month is missing then I want to group as "1 month missing" 2. if two months missing consecutively then name the group as " 2 month missing" 3. if 3 months then "3 month missing" 4. if 4 - 6 months missing then "4-6 months missing" 5. if missing months alternatively like available in one month and not available in next month and then available in next two months then I want to group them as "Chaos" 6. if missing more than 7-12 months then "7-12 months missing" 7. if missing more than 12 months then "12+ months missing" 8. if can be seen only once in ending periods name as "reborn" 9. If seen in the start of the period and never see any data set f or 15 moths then "dead"

The expected result is show as below:

+----+-------+--------------------+
| ID | Group | Group description  |
+----+-------+--------------------+
|  1 |     1 | 1  months missing  |
|  2 |     5 | choas              |
|  3 |     2 | 2 months missing   |
|  4 |     4 | 4-6 months missing |
|  5 |     8 | Reborn             |
|  6 |     9 | Dead               |
+----+-------+--------------------+
2
Does your month variable have actual dates, like '01OCT2018'd that are just formatted to look like 201810? Or are they strings like '201810'? Or numbers like 201,810 that you are displaying without the thousands separator? Is the 15 month window fixed? Does it vary from run to run? From ID to ID within a run?Tom
The date in the data set itself is formatted as string '201901'. I have two separate string fields as month and year as "01" and "2019". The 15 months window is fixed and need to run every time with 15 months range.Sujatha

2 Answers

0
votes

First to replicate your data

/***********************************************************************/
/*              ORIGINAL DATA                                          */
/***********************************************************************/
Data have;
    Input ID date yymmn6.;
    datalines;
1 201810
1 201811
1 201901
1 201902
1 201903
1 201904
1 201905
1 201906
2 201811
2 201901
2 201903
2 201904
2 201905
2 201906
3 201901
3 201902
3 201903
3 201904
3 201905
3 201906
4 201810
4 201903
4 201904
4 201905
4 201906
5 201906
6 201810
;
Run;

Data Have;
    Set Have;
    format date yymmn6.;
Run;

Then the following macro you can use to create a master list of all the possible year months between the dates you want:

/***********************************************************************/
/*  How to Find out IF Your Data is Missing a Date in sequence         */
/***********************************************************************/

%let start_date=01OCT2018; /*Change this to your starting date*/ 
%let end_date=01jun2019;   /*Change this to your Ending date*/
data month;
want=1;
date="&start_date"d;
do while (date<="&end_date"d);
    output;
    date=intnx('month', date, 1, 's');
end;
format date yymmn6.;
run;

Lastly you just merge the two, any column that has a null/missing ID is what you will group by for your categorization logic.

Proc sort data= have;
    by date;
Proc sort data=month;
    by date;
Run;

Data Want;
    merge Have month;
    by date;
Run;
0
votes

The dataset you have provided does not let you create all the categories - so assuming 9 months instead of 18 here. You will have to make some changes to make it work for 18 months. Here is one way of doing this:

I am reading the months as just numbers:

data have;
  input id month;
1 201810
1 201811
1 201901
;
run;

If you read month as a date field, then you need to do so when creating the allmonths dataset below also.

/* Create a dataset that contains all months for all IDs */

proc sort data=have(keep=id) nodupkey out=ids;
  by id;
run;

/* Very lazy way of populating the months. There are elegant ways to do this */

data allmonths;
  set ids;
  do month = 201810 to 201812;
    output;
  end;
  do month = 201901 to 201906;
    output;
  end;
run;

/* Merge the full combination with what you have and put a marker to indicate if a particular month is present for the ID or not */

data merged;
  merge allmonths have (in=a);
  by id month;
  if a then present=1;
  else present =0;
run;

/* Form a bit pattern and use that to categorize your cases */

data want;
  set merged;
  by id;
  retain pattern counter cnt0;
  if first.id then do;
     pattern = repeat('0',8);
     counter = 1;
     cnt0 = 0;
  end;

  if present then substr(pattern,counter,1) = '1';
  else cnt0 + 1;
  counter + 1;

  /* You could use a macro to auto generate these combinations if you expect to have very many categories */

  length desc $50;
  if last.id then do;
    if pattern = "000000001" then desc = "Reborn";
    else if pattern = "100000000" then desc = "Dead";
    else if cnt0 = 1 then desc = "1 months missing";
    else if cnt0 = 2 and index(pattern, '00') then desc = "2 months missing";
    else if cnt0 = 2 then desc = "chaos";
    else if cnt0 = 3 and index(pattern, '000') then desc = "3 months missing";
    else if cnt0 = 3 then desc = "chaos";
    else if cnt0 = 4 and index(pattern, '0000') then desc = "4 - 6 months missing";
    else if cnt0 = 4 then desc = "chaos";
    else if cnt0 = 5 and index(pattern, '00000') then desc = "4 - 6 months missing";
    else if cnt0 = 5 then desc = "chaos";
    else if cnt0 = 6 and index(pattern, '000000') then desc = "4 - 6 months missing";
    else if cnt0 = 6 then desc = "chaos";
    output;

  end;

  keep id desc;
run;