0
votes

I have imported an excel sheet where the date1 is 4/1/16 date2 is 5/29/14 and date3 is 5/2/14. However, when I import the sheet into SAS and do PROC PRINT gives the first 2 variable columns as "42461" and "41788" while the date3 is 05/02/2014.

I need these date formats consistent b/c I am doing a Cox regression with PROC PHREG.

Any thoughts about how to make these dates consistent?

Thanks!

2

2 Answers

2
votes

This probably depends on how the data is represented in Excel and how it is imported into SAS. First, are the formats the same in Excel? The first two are being imported as a number. The second as a string.

In Excel, you can format the column using a date format. Perhaps your import method will recognize this. You can also define another column as a string, using the text(<whatever>, "YYYY-MM-DD") to convert to a string in that format.

Alternatively, you can import all as numbers and then add the value to 1899-12-31. That is the base date for Excel. This makes more sense if you think of "1" as being 1900-01-01.

1
votes

Because your column had mixed numeric (date) and character values SAS imported the field as character. So the actual dates got imported as the text version of the actual number that Excel stores for dates. The ones that look like date strings in SAS are the fields that were strings in Excel also.

Or if in your case one of the three columns was all valid dates then SAS imported it as a number and assigned a date format to it so there is nothing to fix for that column.

The best way to fix it is to make sure that all of the values in the date column are either real dates or empty cells. Then PROC IMPORT will be able to make the right guess at how to import it.

Once you have the strings in SAS and you want to try to fix them then you need to decide which strings look like integers and which should be treated as date strings.

So you might just check if they have any non-digit characters and assume those are the ones that are date strings instead of numbers. For the ones that look like integers just adjust the number to account for the fact that Excel numbers dates from 1900 and SAS numbers them from 1960.

data want ;
  set have ;
  if missing(exel_string) then date=.;
  else if notdigit(trim(excel_string)) then date=input(excel_string,anydtdte32.);
  else date=input(excel_string,32.) + '01JAN1900'd -2 ;
  format date yymmdd10. ;
run;

You might wonder why the minus 2? It is because Excel starts from 1 instead of 0 and also because Excel thinks 1900 was a leap year. Here are the Excel date numbers for some key dates and a little SAS program to convert them. Try it.

data excel_dates;
  input datestr :$10. excel_num :comma32. @1 sas_num :yymmdd10. ;
  diff = sas_num - excel_num ;
  format _numeric_ comma14. ;
  sasdate1 = excel_num - 21916; 
  sasdate2 = excel_num + '01JAN1900'd -2 ;
  format sasdate: yymmdd10.;
cards;
1900-01-01  1 
1900-02-28  59 
1900-03-01  61 
1960-01-01  21,916 
2018-01-01  43,101 
;