5
votes

I need to import a large amount of csv files in one SAS dataset. They all have the same data structure (same variables, variable names on the first line). I usually work in SQL, but I am forced to this particular project in SAS of which I have only basic knowledge.

For the moment, my code looks like this:

proc import out=work.data
   datafile = file1.csv
   DBMS=CSV REPLACE;
   GETNAMES=YES;
   DATAROW=2;

proc import out=work.newData
   datafile = file2.csv
   DBMS=CSV REPLACE;
   GETNAMES=YES;
   DATAROW=2;

proc append base=work.data 
            data=work.newData force;    
run;

and so on for file3.csv ... file4.csv.

There is, I am sure, a more elegant way of doing this, that is, looping over all csv files on one folder without writing them explicitly (there are a few thousand files).

Thanks for your help.

5

5 Answers

4
votes

You need to figure out the input statement, rather than using PROC IMPORT (though if you use PROC IMPORT once, it will politely write that input code to the log which you can then use), and then you can use wildcards:

data mydata;
infile "c:\temp\*.csv" dlm=',' missover lrecl=32767;
input
myvar1
myvar2 $
myvar3
myvar4 :date9.
;
run;

Some other options exist; see https://communities.sas.com/message/182012#182012 for example for other ways to do it.

2
votes

The import procedure accepts a fileref created by the filename statement, and the filename statement accept multiple files. Thus you can do the following:

filename csvs ('file1.csv', 'file2.csv', 'file3.csv', 'file4.csv');

proc import out=work.data
datafile = csvs
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
run;

I'm not entirely sure how this would work with multiple csv's that each have a heading, but I seem to remember that SAS recognizes each new file and skips the first row as per your DATAROW=2 statement.

Or you could use a macro to loop over all files in the folder, import them, and append them to your master set. This answer here should help you get a list of all files in a folder. Then you can do something like:

%macro importcsvs (folder, outputname);
    /* I assume this is your macro that takes a folder and returns a dataset */
    /* (called filelist) containing the filename in variable: file */
    %getallfilesinfolder(&folder, filelist);

    /* determine number of files to read */
    %let numfiles=0;
    data _null_;
    set filelist end=last;
    if last then call symput('numfiles',put(_n_,best.));
    run;

    %do i=1 %to &numfiles;
        %let curfile=;
        data _null_;
        /* obs and firstobs =i mean you only read item i in the dataset */
        set filelist (obs=&i firstobs=&i); 
        call symput('curfile',file);
        run;

        filename csv "&curfile";
        proc import out=work.data
        datafile = csv
        DBMS=CSV REPLACE;
        GETNAMES=YES;
        DATAROW=2;
        run;

        data work.&outputname;
        set
        %if %sysfunc(exist(work.&outputname)) %then %do;
             work.&outputname
        %end;
             work.data;
        run;
    %end;

%mend;

%importcsvs(/your/folder/with/csvs, newData);
2
votes

To expand a bit on Joe's answer - here's an example of how to import multiple CSV files, all with header rows:

  • In one data step
  • Without generating notes or errors in the log each time SAS processes a header row
  • Without including any of the header rows in the resulting SAS dataset

Code:

/*First make a few CSV files to import*/

%macro generate_csvs(n);
    %do i = 1 %to &n;
        proc export
            data = sashelp.class 
            dbms = csv 
            outfile = "C:\temp\class&n..csv";
        run;
    %end;
%mend generate_csvs;

%generate_csvs(2);

/*Import the CSV files*/

data want;
    length _filename_ $32;
    infile "c:\temp\class*.csv" dlm = ',' filename = _filename_;
    input @;
    if _filename_ ne lag1(_filename_) then delete;
    input name :$8. sex :$1. age :8. height :8. weight :8.;
run;

Notes:

  • _filename_ isn't updated until SAS has tried to read past the end of a file - input @; triggers this without actually inputting any data, while holding the current line for the second input statement for non-header rows.
  • _filename_ is not included in the output dataset, but you can easily create another variable as a copy of it if you want to track which csv each record came from.
  • The : format modifier allows the input statement to read formatted variables that are not in fixed-width columns.
1
votes

Copy all files into a folder and change it as a current folder in sas

x 'cd C:\Users\csvfolder';/*Your-folder-path*/

filename csv ('*.csv');

proc import out=work.allcsv
datafile = csv DBMS=CSV REPLACE;
GETNAMES=yes;
run;
0
votes

The looping approach in SAS macro language should work. For instance:

%macro loops(data);

data test;infile "c:\folder\&&data..csv" dsd missover lrecl=xyz dlm=
           firstobs=  and so on;
input var1-var20;
run;

proc append data=test base=AllCSVFiles;
run;

%mend;
%loops(csvdatafile)
%loops(etc...)

This is pretty generic code. If you can create a file containing only the names for the csv data files, then you can call %loops within a new macro:

data files;infile 'c:\folder\FileWithCsvDataFileNames' options as above;
length csvfilename $32.;
input csvfilename;
run;

%macro callmacro;

data _null_;set files nobs=nobs;
call symput('TotFiles',put(nobs,8.));
run;

%do a=1 %to &&TotFiles;

data _null_;set files;
if _n_=&&a;
call symputx('csvdatafilex',put(csvfilename,$32.));
run;

%loops(&&csvdatafilex)

%end;

%mend;
%callmacro