1
votes

I'm attempting to import data into SAS Enterprise Guide. The file is a csv originally generated by saving out an Excel doc on a Mac. There are three columns, Date, DayOfYear, and MonthOfYear.

My import code looks like this:

DATA indata;
    INFILE '/sasdata/{path_to_file}' TERMSTR=cr DSD DLM=',';
    INPUT Date YYMMDD10. DayOfYear MonthOfYear;
    FORMAT Date YYMMDD10.;
RUN;

The problem is that this results in everything after the date column being shifted over by one column. Example output:

Date       | DOY | MOY
2017-01-01 |     |  1
2017-01-02 |     |  2
2017-01-03 |     |  3

My hacky solution has been to add an extra column named junk to get all the empty values, but I'd like to solve this for real if possible. Hack below.

DATA indata;
    INFILE '/sasdata/{path_to_file}' TERMSTR=cr DSD DLM=',';
    INPUT Date YYMMDD10. junk DayOfYear MonthOfYear;
    FORMAT Date YYMMDD10.;
RUN;

Date       | junk | DOY | MOY
2017-01-01 |      |  1  |  1
2017-01-02 |      |  2  |  1
2017-01-03 |      |  3  |  1

I've tried messing with line feeds and carriage returns to no avail. I put in both DSD and DLM, but that doesn't change anything. It seems like the date has an extra invisible character at the end that's ending up in it's own column, but when I look at the data in a hex editor there's nothing there. For example

ef bb bf 32-30 31 37 2d-30 31 2d 30-31 2c 31 ...

It goes straight from 2017-01-01 to a comma to the 1. So there are no hidden characters. I'm not sure how else this could be happening. This is the second file I've had this issue with so I know it's not just a one off thing.

To summarize my question, how do I avoid creating a extra column just for junk after date variables?

2
Just a thought, but what if you try using the colon (:) to apply formatted input - eg INPUT Date: YYMMDD10. ?Allan Bowe
@Allan Bowe That fixed it! Feel free to add as an answer and I'll accept it.Joey Harwood

2 Answers

2
votes

You should use a colon (:) to apply an informat in your input statement, as follows:

DATA indata;
    INFILE '/sasdata/{path_to_file}' TERMSTR=cr DSD DLM=',';
    INPUT Date : YYMMDD10. DayOfYear MonthOfYear;
    FORMAT Date YYMMDD10.;
RUN;

As per documentation:

The : (colon) format modifier enables you to use list input but also to specify an informat after a variable name, whether character or numeric. SAS reads until it encounters a blank column, the defined length of the variable (character only), or the end of the data line, whichever comes first.

1
votes

If you are reading delimited data then you should use list mode input. You can either remove all of the informats from the INPUT statement or add the : modifier before the informat in the INPUT statement so that you are still using modifed list mode. If you remove the informats from the input statement you can instead use an INFORMAT statement to tell SAS how to read the date variable.

data indata;
  infile '/sasdata/{path_to_file}' termstr=cr dsd truncover ;
  length Date DayOfYear MonthOfYear 8;
  input Date DayOfYear MonthOfYear;
  informat date yymmdd. ;
  format Date yymmdd10.;
run;

When you tell SAS to use formatted input when reading a delimited data line three things could happen, all of which are bad.

(1) you could read too few characters and so get the wrong result. This will also cause the rest of the date value to be read as the next column's value.

(2) Read too many characters and so try to read the next delimiter as part of the date or even part of the next value. And possible leave only part of the next value to be read for the next column.

(3) Read exactly the right number of characters so that the date is right, but then you leave the cursor position right before the next delimiter. Then if you are using the DSD option to allow for null values this will make SAS think the next value is null and essentially shift all the rest of the line into the wrong columns.