1
votes

I have a large data file with data in the following format: country, datatype, year1month1 to year2018month7.

Reading the data using proc import did not work for all data fields. I ended up modifying the SAS datastep code to ensure data format was correct.

However, I am having trouble simplifying the code, namely I would like a do loop to go through all the years and month. This way, I could use current date to figure out the range of dates for the file and the code to create Year/Month variable does not have to repeat 100 times in the file.

data test;
infile 'abc.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

informat Country_Name $34. ;
do i = 1940 to 2018; 
    do j = 1 to 12; 
        informat _(i)M(j) best32.;
    end;
end;
informat Base_Year $1. ;

format Country_Name $34. ;
do i = 1940 to 2018; 
    do j = 1 to 12; 
        format _(i)M(j) best12.;
    end;
end;
format Base_Year $1. ;

input
Country_Name  $
do i = 1940 to 2018; 
    do j = 1 to 12; 
        _(i)M(j) $;
    end;
end;
Base_Year  $;
run;
2

2 Answers

0
votes

There are a few approaches here that could work. The most directly translatable to your approach is to use the macro language.

You need to translate those two loops to something like this:

%do i = 1940 %to 2018; 
    %do j = 1 %to 12; 
        informat _&i.M&j. best32.;
    %end;
%end;

Notice the % there. This also has to be in a macro; you can't do this in normal datastep code.

I would rewrite it to use a macro like so:

%macro make_ym(startyear=, endyear=, separator=);
%local i j;
%do i = &startyear. %to &endyear.; 
    %do j = 1 %to 12; 
        _&i.&separator.&j.
    %end;
%end;
%mend make_ym;

data test;
infile 'abc.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;

informat Country_Name $34. ;

informat %make_ym(startyear=1940,endyear=2018,separator=M) best32.;

informat Base_Year $1. ;

format %make_ym(startyear=1940,endyear=2018,separator=M) best12.;

format Base_Year $1. ;

input
Country_Name  $
%make_ym(startyear=1940,endyear=2018,separator=M)
Base_Year  $;
run;

I took out the $ after the yMm bits in the input since you declared them as numeric.

0
votes

Don't model your data step after the code generated by PROC IMPORT. It does a lot of useless things, like attaching formats and informats to variables that don't need them.

For your problem you just need a simple program like this:

data test;
   infile 'abc.csv' dsd dlm= ',' truncover firstobs=2 ;
   input Country_Name :$34. Y1940M01 .... Y2018M08 Base_Year :$1. ;
run;

Now the only tricky part is building that list of numerical variables. If the list is small enough you could just put it into a macro variable. Fortunately that is not a problem in this case since using 8 character names (YyyyyMmm) there is room for over 300 years worth in a data step character variable. A variable of length 10,800 bytes should have room for 100 years of month names.

So just run this data step first.

data _null_;
  length names $10800 ;
  basedate = mdy(1,1,1940);
  lastdate = today();
  do i=0 to intck('month',basedate,lastdate);
    date=intnx('month',basedate,i);
    names=catx(' ',names,cats('Y',year(date),'M',put(month(date),Z2.)));
  end;
  call symputx('names',names);
run;

Now you can use the macro variable in your INPUT statement.

data test;
   infile 'abc.csv' dsd dlm= ',' truncover firstobs=2 ;
   input Country_Name :$34. &names Base_Year :$1. ;
run;