0
votes

I have a large number of files which I need to make into SAS datasets. I have the names of all the files in a SAS dataset, for example:

doc_names
1. filename1
2. filename2
      .
      .
      .

So I need to access each filename, concatenate it with the path to the file and tell SAS to grab that file and make a dataset out of it.

For example, if I were doing this in R, it would look something like this:

path = 'path-to-files'
filenames = readLines('file-with-filenames.txt')
for (i in 1:length(filenames)) {
  current.file = filenames[i]
  full.file = paste0(path, current.file)
  data.set = read.csv(full.file)
}

It's basically current.file = filenames[i] portion that I can't figure out in SAS. I need to be able to make a macro variable out of a particular entry of a dataset so that I can concatenate it with the pathway and then tell SAS that it is a filename.

P.S., I am aware that I can create a new dataset entry which has the full pathway to the file. That is all well and good, but it isn't the heart of the problem.

Thanks for the help!

2
Can we assume that you do actually want X separate data sets? SAS can read all the files into a single file and identify which file it comes from in a single step - using either the wildcard and or filevar/filename options. In general, it's better to have a single dataset and use BY processing further on.Reeza

2 Answers

1
votes

Personally I find it much easier to use a data step to generate code for this type of one off programming task. So read the filenames from your file, create variables with the name of the full path to the file and what dataset name you want to import the CSV file into. Then you have all the information you need to write the PROC IMPORT statement.

filename code temp;
data filenames ;
  length path $100 filename $100 dsname $32 fullname $200 ;
  path = 'path-to-files' ;
  infile 'file-with-filenames.txt' truncover;
  input filename $100.;
  fullname=catx('/',path,filename);
  dsname = scan(filename,1,'.');
  file code ;
  put 'proc import datafile=' fullname :$quote. 'out=' dsname
    / '  dbms=dlm replace ; '
    / '  delimiter=","; '
    / 'run;'
  ;
run;
%include code / source2 ;

Also in reality I am not sure I would trust either PROC IMPORT or read.csv() to consistently convert a series of files into the same structure. If the CSV files all contain the same type of information I would just read them all into one dataset. So if your CSV files all have four variables then this data step will read them all into one data set.

data all_data ;
  length path $100 filename $100 fullname $200 ;
  path = 'path-to-files' ;
  infile 'file-with-filenames.txt' truncover;
  input filename $100.;
  fullname=catx('/',path,filename);
  infile csv filevar=fullname dsd truncover end=eof ;
  * skip the header line;
  if not eof then input ;
  do while (not eof);
    length var1 var2 8 var3 $20 var4 8 ;
    informat var2 date9.;
    input var1-var4 ;
    output;
  end;
run;
0
votes

Assuming you have a macro that contains your reading in code [e.g. %readin(file=) which contains the proc import or what have you], you have a lot of options. The easiest:

proc sql;
  select cats('%readin(file=',filename,')') 
  into :readinlist separated by ' '
  from yourdataset;
quit;

If the path is constant, you can either add that to your readin macro (hardcoded), or add it as a parameter to your readin macro, or add it to the concatenation above - whatever makes the most sense for your problem.