0
votes

I'm reading in all CSV files from a given directory. I need to limit the observations to only read between the 4th and 128th record. When I use the firstobs and the obs options to limit the observation numbers, it applies to all files at once, limiting my data to the first file read in as they all have over 200 records.

How can I limit the number of records read in from EACH CSV file using this process?

Code below:

data mydata;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
length FNAME $175.; 
infile 'M:\Load Forecasts\Woods & Poole Download_Files_2017\SP_2017_MT_WY\WPGEO\MT Counties\*.csv'
    delimiter = ',' MISSOVER DSD lrecl=32767 filename=fname /*firstobs=4 obs=128*/;
length desc $100;
input desc $  Y1969-Y2050 $;
if _ERROR_ then call symputx('_EFIERR_',1);
filename=fname;
run;

proc print data= mydata (obs=10); run;

csv file read

1
Post your code as text not an image. If we want to make changes or modifications we'd have to type it out which I'm too lazy to do.Reeza
Did you try using the EOV= option on the INFILE statement? Or you could test if fname=lag(fname) to see when filename changes.Tom

1 Answers

2
votes

Use the filename= option to make the currently open filename available. When the filename is not the same as its lag then you are reading from the next file. Maintain your own row counter and filter based on that.

data _null_;
  do i = 1 to 10;
    outfile =  cats('f',put(i,z2.),'.txt');
    file out filevar=outfile;
    do row = 1 to 10;
      z = i/100;
      put z @;
      do j = 1 to 10; 
        put ',' j @;
      end;
      put;
    end;
  end;
run;

data want;
  length filename source $200 rownumber 8;
  infile '*.txt' dlm=',' filename=filename;
  input z v1-v10;

  source = filename;

  if filename ne lag(filename) 
    then rownumber=1; 
    else rownumber+1;

  if 4 < rownumber < 9;
run;